From 5d01094479d0d85e0365e246f1895a7d6f01cf1c Mon Sep 17 00:00:00 2001
From: Ianj332 <jiangjs03@gmail.com>
Date: Fri, 23 Jan 2026 19:40:43 -0600
Subject: [PATCH] feat: add nosql-expert skill for distributed database
 patterns

---
 skills/nosql-expert/SKILL.md | 118 +++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100644 skills/nosql-expert/SKILL.md

diff --git a/skills/nosql-expert/SKILL.md b/skills/nosql-expert/SKILL.md
new file mode 100644
index 00000000..a83e43d1
--- /dev/null
+++ b/skills/nosql-expert/SKILL.md
@@ -0,0 +1,118 @@
+---
+name: nosql-expert
+description: "Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems."
+---
+
+# NoSQL Expert Patterns (Cassandra & DynamoDB)
+
+## Overview
+
+This skill provides professional mental models and design patterns for **distributed wide-column and key-value stores** (specifically Apache Cassandra and Amazon DynamoDB).
+
+Unlike SQL (where you model data entities), or document stores (like MongoDB), these distributed systems require you to **model your queries first**.
+
+## When to Use
+
+- **Designing for Scale**: Moving beyond simple single-node databases to distributed clusters.
+- **Technology Selection**: Evaluating or using **Cassandra**, **ScyllaDB**, or **DynamoDB**.
+- **Performance Tuning**: Troubleshooting "hot partitions" or high latency in existing NoSQL systems.
+- **Microservices**: Implementing "database-per-service" patterns where highly optimized reads are required.
+
+## Relation to Existing Skills
+
+While this repo contains product-specific skills (e.g., `firebase`, `upstash-qstash`), this **NoSQL Expert** skill focuses on the **universal architectural patterns** for high-scale distributed databases.
+
+- Use `firebase` or `upstash-qstash` for implementation details of those specific tools.
+- Use **this skill** fo **Data Modeling** and **System Design** (Partitioning, Sharding, Consistency) *before* you write code. This addresses the "SQL to NoSQL" mental shift.
+
+## The Mental Shift: SQL vs. Distributed NoSQL
+
+| Feature | SQL (Relational) | Distributed NoSQL (Cassandra/DynamoDB) |
+| :--- | :--- | :--- |
+| **Data modeling** | Model Entities + Relationships | Model **Queries** (Access Patterns) |
+| **Joins** | CPU-intensive, at read time | **Pre-computed** (Denormalized) at write time |
+| **Storage cost** | Expensive (minimize duplication) | Cheap (duplicate data for read speed) |
+| **Consistency** | ACID (Strong) | **BASE (Eventual)** / Tunable |
+| **Scalability** | Vertical (Bigger machine) | **Horizontal** (More nodes/shards) |
+
+> **The Golden Rule:** In SQL, you design the data model to answer *any* query. In NoSQL, you design the data model to answer *specific* queries efficiently.
+
+## Core Design Patterns
+
+### 1. Query-First Modeling (Access Patterns)
+
+You typically cannot "add a query later" without migration or creating a new table/index.
+
+**Process:**
+1.  **List all Entities** (User, Order, Product).
+2.  **List all Access Patterns** ("Get User by Email", "Get Orders by User sorted by Date").
+3.  **Design Table(s)** specifically to serve those patterns with a single lookup.
+
+### 2. The Partition Key is King
+
+Data is distributed across physical nodes based on the **Partition Key (PK)**.
+-   **Goal:** Even distribution of data and traffic.
+-   **Anti-Pattern:** Using a low-cardinality PK (e.g., `status="active"` or `gender="m"`) creates **Hot Partitions**, limiting throughput to a single node's capacity.
+-   **Best Practice:** Use high-cardinality keys (User IDs, Device IDs, Composite Keys).
+
+### 3. Clustering / Sort Keys
+
+Within a partition, data is sorted on disk by the **Clustering Key (Cassandra)** or **Sort Key (DynamoDB)**.
+-   This allows for efficient **Range Queries** (e.g., `WHERE user_id=X AND date > Y`).
+-   It effectively pre-sorts your data for specific retrieval requirements.
+
+### 4. Single-Table Design (Adjacency Lists)
+
+*Primary use: DynamoDB (but concepts apply elsewhere)*
+
+Storing multiple entity types in one table to enable pre-joined reads.
+
+| PK (Partition) | SK (Sort) | Data Fields... |
+| :--- | :--- | :--- |
+| `USER#123` | `PROFILE` | `{ name: "Ian", email: "..." }` |
+| `USER#123` | `ORDER#998` | `{ total: 50.00, status: "shipped" }` |
+| `USER#123` | `ORDER#999` | `{ total: 12.00, status: "pending" }` |
+
+-   **Query:** `PK="USER#123"`
+-   **Result:** Fetches User Profile AND all Orders in **one network request**.
+
+### 5. Denormalization & Duplication
+
+Don't be afraid to store the same data in multiple tables to serve different query patterns.
+-   **Table A:** `users_by_id` (PK: uuid)
+-   **Table B:** `users_by_email` (PK: email)
+
+*Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).*
+
+## Specific Guidance
+
+### Apache Cassandra / ScyllaDB
+
+-   **Primary Key Structure:** `((Partition Key), Clustering Columns)`
+-   **No Joins, No Aggregates:** Do not try to `JOIN` or `GROUP BY`. Pre-calculate aggregates in a separate counter table.
+-   **Avoid `ALLOW FILTERING`:** If you see this in production, your data model is wrong. It implies a full cluster scan.
+-   **Writes are Cheap:** Inserts and Updates are just appends to the LSM tree. Don't worry about write volume as much as read efficiency.
+-   **Tombstones:** Deletes are expensive markers. Avoid high-velocity delete patterns (like queues) in standard tables.
+
+### AWS DynamoDB
+
+-   **GSI (Global Secondary Index):** Use GSIs to create alternative views of your data (e.g., "Search Orders by Date" instead of by User).
+    -   *Note:* GSIs are eventually consistent.
+-   **LSI (Local Secondary Index):** Sorts data differently *within* the same partition. Must be created at table creation time.
+-   **WCU / RCU:** Understand capacity modes. Single-table design helps optimize consumed capacity units.
+-   **TTL:** Use Time-To-Live attributes to automatically expire old data (free delete) without creating tombstones.
+
+## Expert Checklist
+
+Before finalizing your NoSQL schema:
+
+-   [ ] **Access Pattern Coverage:** Does every query pattern map to a specific table or index?
+-   [ ] **Cardinality Check:** Does the Partition Key have enough unique values to spread traffic evenly?
+-   [ ] **Split Partition Risk:** For any single partition (e.g., a single user's orders), will it grow indefinitely? (If > 10GB, you need to "shard" the partition, e.g., `USER#123#2024-01`).
+-   [ ] **Consistency Requirement:** Can the application tolerate eventual consistency for this read pattern?
+
+## Common Anti-Patterns
+
+❌ **Scatter-Gather:** Querying *all* partitions to find one item (Scan).
+❌ **Hot Keys:** Putting all "Monday" data into one partition.
+❌ **Relational Modeling:** Creating `Author` and `Book` tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).