refactor: split 21 over-500-line skills into SKILL.md + references (#296)

2026-03-08 10:14:30 +01:00
parent e7081583fb
commit fea994eb42
50 changed files with 7133 additions and 6511 deletions
--- a/engineering-team/incident-commander/SKILL.md
+++ b/engineering-team/incident-commander/SKILL.md
@@ -369,204 +369,7 @@ Status page: {link}
 - **{Pitfall}:** {description and how to avoid}

 ## Reference Information
- **Architecture Diagram:** {link}
- **Monitoring Dashboard:** {link}
- **Related Runbooks:** {links to dependent service runbooks}
-```
-
-### Post-Incident Review (PIR) Framework
-
-#### PIR Timeline and Ownership
-
-**Timeline:**
- **24 hours:** Initial PIR draft completed by Incident Commander
- **3 business days:** Final PIR published with all stakeholder input
- **1 week:** Action items assigned with owners and due dates
- **4 weeks:** Follow-up review on action item progress
-
-**Roles:**
- **PIR Owner:** Incident Commander (can delegate writing but owns completion)
- **Technical Contributors:** All engineers involved in response
- **Review Committee:** Engineering leadership, affected product teams
- **Action Item Owners:** Assigned based on expertise and capacity
-
-#### Root Cause Analysis Frameworks
-
-#### 1. Five Whys Method
-
-The Five Whys technique involves asking "why" repeatedly to drill down to root causes:
-
-**Example Application:**
- **Problem:** Database became unresponsive during peak traffic
- **Why 1:** Why did the database become unresponsive? → Connection pool was exhausted
- **Why 2:** Why was the connection pool exhausted? → Application was creating more connections than usual
- **Why 3:** Why was the application creating more connections? → New feature wasn't properly connection pooling
- **Why 4:** Why wasn't the feature properly connection pooling? → Code review missed this pattern
- **Why 5:** Why did code review miss this? → No automated checks for connection pooling patterns
-
-**Best Practices:**
- Ask "why" at least 3 times, often need 5+ iterations
- Focus on process failures, not individual blame
- Each "why" should point to a actionable system improvement
- Consider multiple root cause paths, not just one linear chain
-
-#### 2. Fishbone (Ishikawa) Diagram
-
-Systematic analysis across multiple categories of potential causes:
-
-**Categories:**
- **People:** Training, experience, communication, handoffs
- **Process:** Procedures, change management, review processes
- **Technology:** Architecture, tooling, monitoring, automation
- **Environment:** Infrastructure, dependencies, external factors
-
-**Application Method:**
-1. State the problem clearly at the "head" of the fishbone
-2. For each category, brainstorm potential contributing factors
-3. For each factor, ask what caused that factor (sub-causes)
-4. Identify the factors most likely to be root causes
-5. Validate root causes with evidence from the incident
-
-#### 3. Timeline Analysis
-
-Reconstruct the incident chronologically to identify decision points and missed opportunities:
-
-**Timeline Elements:**
- **Detection:** When was the issue first observable? When was it first detected?
- **Notification:** How quickly were the right people informed?
- **Response:** What actions were taken and how effective were they?
- **Communication:** When were stakeholders updated?
- **Resolution:** What finally resolved the issue?
-
-**Analysis Questions:**
- Where were there delays and what caused them?
- What decisions would we make differently with perfect information?
- Where did communication break down?
- What automation could have detected/resolved faster?
-
-### Escalation Paths
-
-#### Technical Escalation
-
-**Level 1:** On-call engineer
- **Responsibility:** Initial response and common issue resolution
- **Escalation Trigger:** Issue not resolved within SLA timeframe
- **Timeframe:** 15 minutes (SEV1), 30 minutes (SEV2)
-
-**Level 2:** Senior engineer/Team lead
- **Responsibility:** Complex technical issues requiring deeper expertise
- **Escalation Trigger:** Level 1 requests help or timeout occurs
- **Timeframe:** 30 minutes (SEV1), 1 hour (SEV2)
-
-**Level 3:** Engineering Manager/Staff Engineer
- **Responsibility:** Cross-team coordination and architectural decisions
- **Escalation Trigger:** Issue spans multiple systems or teams
- **Timeframe:** 45 minutes (SEV1), 2 hours (SEV2)
-
-**Level 4:** Director of Engineering/CTO
- **Responsibility:** Resource allocation and business impact decisions
- **Escalation Trigger:** Extended outage or significant business impact
- **Timeframe:** 1 hour (SEV1), 4 hours (SEV2)
-
-#### Business Escalation
-
-**Customer Impact Assessment:**
- **High:** Revenue loss, SLA breaches, customer churn risk
- **Medium:** User experience degradation, support ticket volume
- **Low:** Internal tools, development impact only
-
-**Escalation Matrix:**
-
-| Severity | Duration | Business Escalation |
-|----------|----------|-------------------|
-| SEV1 | Immediate | VP Engineering |
-| SEV1 | 30 minutes | CTO + Customer Success VP |
-| SEV1 | 1 hour | CEO + Full Executive Team |
-| SEV2 | 2 hours | VP Engineering |
-| SEV2 | 4 hours | CTO |
-| SEV3 | 1 business day | Engineering Manager |
-
-### Status Page Management
-
-#### Update Principles
-
-1. **Transparency:** Provide factual information without speculation
-2. **Timeliness:** Update within committed timeframes
-3. **Clarity:** Use customer-friendly language, avoid technical jargon
-4. **Completeness:** Include impact scope, status, and next update time
-
-#### Status Categories
-
- **Operational:** All systems functioning normally
- **Degraded Performance:** Some users may experience slowness
- **Partial Outage:** Subset of features unavailable
- **Major Outage:** Service unavailable for most/all users
- **Under Maintenance:** Planned maintenance window
-
-#### Update Template
-
-```
-{Timestamp} - {Status Category}
-
-{Brief description of current state}
-
-Impact: {who is affected and how}
-Cause: {root cause if known, "under investigation" if not}
-Resolution: {what's being done to fix it}
-
-Next update: {specific time}
-
-We apologize for any inconvenience this may cause.
-```
-
-### Action Item Framework
-
-#### Action Item Categories
-
-1. **Immediate Fixes**
-   - Critical bugs discovered during incident
-   - Security vulnerabilities exposed
-   - Data integrity issues
-
-2. **Process Improvements**
-   - Communication gaps
-   - Escalation procedure updates
-   - Runbook additions/updates
-
-3. **Technical Debt**
-   - Architecture improvements
-   - Monitoring enhancements
-   - Automation opportunities
-
-4. **Organizational Changes**
-   - Team structure adjustments
-   - Training requirements
-   - Tool/platform investments
-
-#### Action Item Template
-
-```
-**Title:** {Concise description of the action}
-**Priority:** {Critical/High/Medium/Low}
-**Category:** {Fix/Process/Technical/Organizational}
-**Owner:** {Assigned person}
-**Due Date:** {Specific date}
-**Success Criteria:** {How will we know this is complete}
-**Dependencies:** {What needs to happen first}
-**Related PIRs:** {Links to other incidents this addresses}
-
-**Description:**
-{Detailed description of what needs to be done and why}
-
-**Implementation Plan:**
-1. {Step 1}
-2. {Step 2}
-3. {Validation step}
-
-**Progress Updates:**
- {Date}: {Progress update}
- {Date}: {Progress update}
-```
+→ See references/reference-information.md for details

 ## Usage Examples

@@ -670,4 +473,4 @@ The Incident Commander skill provides a comprehensive framework for managing inc

 The key to successful incident management is preparation, practice, and continuous learning. Use this framework as a starting point, but adapt it to your organization's specific needs, culture, and technical environment.

-Remember: The goal isn't to prevent all incidents (which is impossible), but to detect them quickly, respond effectively, communicate clearly, and learn continuously.
+Remember: The goal isn't to prevent all incidents (which is impossible), but to detect them quickly, respond effectively, communicate clearly, and learn continuously.
--- a/engineering-team/incident-commander/references/reference-information.md
+++ b/engineering-team/incident-commander/references/reference-information.md
@@ -0,0 +1,201 @@
+# incident-commander reference
+
+## Reference Information
+- **Architecture Diagram:** {link}
+- **Monitoring Dashboard:** {link}
+- **Related Runbooks:** {links to dependent service runbooks}
+```
+
+### Post-Incident Review (PIR) Framework
+
+#### PIR Timeline and Ownership
+
+**Timeline:**
+- **24 hours:** Initial PIR draft completed by Incident Commander
+- **3 business days:** Final PIR published with all stakeholder input
+- **1 week:** Action items assigned with owners and due dates
+- **4 weeks:** Follow-up review on action item progress
+
+**Roles:**
+- **PIR Owner:** Incident Commander (can delegate writing but owns completion)
+- **Technical Contributors:** All engineers involved in response
+- **Review Committee:** Engineering leadership, affected product teams
+- **Action Item Owners:** Assigned based on expertise and capacity
+
+#### Root Cause Analysis Frameworks
+
+#### 1. Five Whys Method
+
+The Five Whys technique involves asking "why" repeatedly to drill down to root causes:
+
+**Example Application:**
+- **Problem:** Database became unresponsive during peak traffic
+- **Why 1:** Why did the database become unresponsive? → Connection pool was exhausted
+- **Why 2:** Why was the connection pool exhausted? → Application was creating more connections than usual
+- **Why 3:** Why was the application creating more connections? → New feature wasn't properly connection pooling
+- **Why 4:** Why wasn't the feature properly connection pooling? → Code review missed this pattern
+- **Why 5:** Why did code review miss this? → No automated checks for connection pooling patterns
+
+**Best Practices:**
+- Ask "why" at least 3 times, often need 5+ iterations
+- Focus on process failures, not individual blame
+- Each "why" should point to a actionable system improvement
+- Consider multiple root cause paths, not just one linear chain
+
+#### 2. Fishbone (Ishikawa) Diagram
+
+Systematic analysis across multiple categories of potential causes:
+
+**Categories:**
+- **People:** Training, experience, communication, handoffs
+- **Process:** Procedures, change management, review processes
+- **Technology:** Architecture, tooling, monitoring, automation
+- **Environment:** Infrastructure, dependencies, external factors
+
+**Application Method:**
+1. State the problem clearly at the "head" of the fishbone
+2. For each category, brainstorm potential contributing factors
+3. For each factor, ask what caused that factor (sub-causes)
+4. Identify the factors most likely to be root causes
+5. Validate root causes with evidence from the incident
+
+#### 3. Timeline Analysis
+
+Reconstruct the incident chronologically to identify decision points and missed opportunities:
+
+**Timeline Elements:**
+- **Detection:** When was the issue first observable? When was it first detected?
+- **Notification:** How quickly were the right people informed?
+- **Response:** What actions were taken and how effective were they?
+- **Communication:** When were stakeholders updated?
+- **Resolution:** What finally resolved the issue?
+
+**Analysis Questions:**
+- Where were there delays and what caused them?
+- What decisions would we make differently with perfect information?
+- Where did communication break down?
+- What automation could have detected/resolved faster?
+
+### Escalation Paths
+
+#### Technical Escalation
+
+**Level 1:** On-call engineer
+- **Responsibility:** Initial response and common issue resolution
+- **Escalation Trigger:** Issue not resolved within SLA timeframe
+- **Timeframe:** 15 minutes (SEV1), 30 minutes (SEV2)
+
+**Level 2:** Senior engineer/Team lead
+- **Responsibility:** Complex technical issues requiring deeper expertise
+- **Escalation Trigger:** Level 1 requests help or timeout occurs
+- **Timeframe:** 30 minutes (SEV1), 1 hour (SEV2)
+
+**Level 3:** Engineering Manager/Staff Engineer
+- **Responsibility:** Cross-team coordination and architectural decisions
+- **Escalation Trigger:** Issue spans multiple systems or teams
+- **Timeframe:** 45 minutes (SEV1), 2 hours (SEV2)
+
+**Level 4:** Director of Engineering/CTO
+- **Responsibility:** Resource allocation and business impact decisions
+- **Escalation Trigger:** Extended outage or significant business impact
+- **Timeframe:** 1 hour (SEV1), 4 hours (SEV2)
+
+#### Business Escalation
+
+**Customer Impact Assessment:**
+- **High:** Revenue loss, SLA breaches, customer churn risk
+- **Medium:** User experience degradation, support ticket volume
+- **Low:** Internal tools, development impact only
+
+**Escalation Matrix:**
+
+| Severity | Duration | Business Escalation |
+|----------|----------|-------------------|
+| SEV1 | Immediate | VP Engineering |
+| SEV1 | 30 minutes | CTO + Customer Success VP |
+| SEV1 | 1 hour | CEO + Full Executive Team |
+| SEV2 | 2 hours | VP Engineering |
+| SEV2 | 4 hours | CTO |
+| SEV3 | 1 business day | Engineering Manager |
+
+### Status Page Management
+
+#### Update Principles
+
+1. **Transparency:** Provide factual information without speculation
+2. **Timeliness:** Update within committed timeframes
+3. **Clarity:** Use customer-friendly language, avoid technical jargon
+4. **Completeness:** Include impact scope, status, and next update time
+
+#### Status Categories
+
+- **Operational:** All systems functioning normally
+- **Degraded Performance:** Some users may experience slowness
+- **Partial Outage:** Subset of features unavailable
+- **Major Outage:** Service unavailable for most/all users
+- **Under Maintenance:** Planned maintenance window
+
+#### Update Template
+
+```
+{Timestamp} - {Status Category}
+
+{Brief description of current state}
+
+Impact: {who is affected and how}
+Cause: {root cause if known, "under investigation" if not}
+Resolution: {what's being done to fix it}
+
+Next update: {specific time}
+
+We apologize for any inconvenience this may cause.
+```
+
+### Action Item Framework
+
+#### Action Item Categories
+
+1. **Immediate Fixes**
+   - Critical bugs discovered during incident
+   - Security vulnerabilities exposed
+   - Data integrity issues
+
+2. **Process Improvements**
+   - Communication gaps
+   - Escalation procedure updates
+   - Runbook additions/updates
+
+3. **Technical Debt**
+   - Architecture improvements
+   - Monitoring enhancements
+   - Automation opportunities
+
+4. **Organizational Changes**
+   - Team structure adjustments
+   - Training requirements
+   - Tool/platform investments
+
+#### Action Item Template
+
+```
+**Title:** {Concise description of the action}
+**Priority:** {Critical/High/Medium/Low}
+**Category:** {Fix/Process/Technical/Organizational}
+**Owner:** {Assigned person}
+**Due Date:** {Specific date}
+**Success Criteria:** {How will we know this is complete}
+**Dependencies:** {What needs to happen first}
+**Related PIRs:** {Links to other incidents this addresses}
+
+**Description:**
+{Detailed description of what needs to be done and why}
+
+**Implementation Plan:**
+1. {Step 1}
+2. {Step 2}
+3. {Validation step}
+
+**Progress Updates:**
+- {Date}: {Progress update}
+- {Date}: {Progress update}
+```
--- a/engineering-team/playwright-pro/.claude-plugin/plugin.json
+++ b/engineering-team/playwright-pro/.claude-plugin/plugin.json
@@ -9,18 +9,5 @@
  "homepage": "https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/playwright-pro",
  "repository": "https://github.com/alirezarezvani/claude-skills",
  "license": "MIT",
-  "keywords": [
-    "playwright",
-    "testing",
-    "e2e",
-    "qa",
-    "browserstack",
-    "testrail",
-    "test-automation",
-    "cross-browser",
-    "migration",
-    "cypress",
-    "selenium"
-  ],
-  "skills": "./skills"
+  "skills": "./"
 }
--- a/engineering-team/senior-computer-vision/SKILL.md
+++ b/engineering-team/senior-computer-vision/SKILL.md
@@ -419,99 +419,7 @@ python scripts/dataset_pipeline_builder.py data/final/ \
 | Positional encoding | Implicit | Explicit |

 ## Reference Documentation
-
-### 1. Computer Vision Architectures
-
-See `references/computer_vision_architectures.md` for:
-
- CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
- Vision Transformer variants (ViT, DeiT, Swin)
- Detection heads (anchor-based vs anchor-free)
- Feature Pyramid Networks (FPN, BiFPN, PANet)
- Neck architectures for multi-scale detection
-
-### 2. Object Detection Optimization
-
-See `references/object_detection_optimization.md` for:
-
- Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
- Anchor optimization and anchor-free alternatives
- Loss function design (focal loss, GIoU, CIoU, DIoU)
- Training strategies (warmup, cosine annealing, EMA)
- Data augmentation for detection (mosaic, mixup, copy-paste)
-
-### 3. Production Vision Systems
-
-See `references/production_vision_systems.md` for:
-
- ONNX export and optimization
- TensorRT deployment pipeline
- Batch inference optimization
- Edge device deployment (Jetson, Intel NCS)
- Model serving with Triton
- Video processing pipelines
-
-## Common Commands
-
-### Ultralytics YOLO
-
-```bash
-# Training
-yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640
-
-# Validation
-yolo detect val model=best.pt data=coco.yaml
-
-# Inference
-yolo detect predict model=best.pt source=images/ save=True
-
-# Export
-yolo export model=best.pt format=onnx simplify=True dynamic=True
-```
-
-### Detectron2
-
-```bash
-# Training
-python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
-    --num-gpus 1 OUTPUT_DIR ./output
-
-# Evaluation
-python train_net.py --config-file configs/faster_rcnn.yaml --eval-only \
-    MODEL.WEIGHTS output/model_final.pth
-
-# Inference
-python demo.py --config-file configs/faster_rcnn.yaml \
-    --input images/*.jpg --output results/ \
-    --opts MODEL.WEIGHTS output/model_final.pth
-```
-
-### MMDetection
-
-```bash
-# Training
-python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
-
-# Testing
-python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
-
-# Inference
-python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
-```
-
-### Model Optimization
-
-```bash
-# ONNX export and simplify
-python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
-python -m onnxsim model.onnx model_sim.onnx
-
-# TensorRT conversion
-trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
-
-# Benchmark
-trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
-```
+→ See references/reference-docs-and-commands.md for details

 ## Performance Targets

--- a/engineering-team/senior-computer-vision/references/reference-docs-and-commands.md
+++ b/engineering-team/senior-computer-vision/references/reference-docs-and-commands.md
@@ -0,0 +1,96 @@
+# senior-computer-vision reference
+
+## Reference Documentation
+
+### 1. Computer Vision Architectures
+
+See `references/computer_vision_architectures.md` for:
+
+- CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
+- Vision Transformer variants (ViT, DeiT, Swin)
+- Detection heads (anchor-based vs anchor-free)
+- Feature Pyramid Networks (FPN, BiFPN, PANet)
+- Neck architectures for multi-scale detection
+
+### 2. Object Detection Optimization
+
+See `references/object_detection_optimization.md` for:
+
+- Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
+- Anchor optimization and anchor-free alternatives
+- Loss function design (focal loss, GIoU, CIoU, DIoU)
+- Training strategies (warmup, cosine annealing, EMA)
+- Data augmentation for detection (mosaic, mixup, copy-paste)
+
+### 3. Production Vision Systems
+
+See `references/production_vision_systems.md` for:
+
+- ONNX export and optimization
+- TensorRT deployment pipeline
+- Batch inference optimization
+- Edge device deployment (Jetson, Intel NCS)
+- Model serving with Triton
+- Video processing pipelines
+
+## Common Commands
+
+### Ultralytics YOLO
+
+```bash
+# Training
+yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640
+
+# Validation
+yolo detect val model=best.pt data=coco.yaml
+
+# Inference
+yolo detect predict model=best.pt source=images/ save=True
+
+# Export
+yolo export model=best.pt format=onnx simplify=True dynamic=True
+```
+
+### Detectron2
+
+```bash
+# Training
+python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
+    --num-gpus 1 OUTPUT_DIR ./output
+
+# Evaluation
+python train_net.py --config-file configs/faster_rcnn.yaml --eval-only \
+    MODEL.WEIGHTS output/model_final.pth
+
+# Inference
+python demo.py --config-file configs/faster_rcnn.yaml \
+    --input images/*.jpg --output results/ \
+    --opts MODEL.WEIGHTS output/model_final.pth
+```
+
+### MMDetection
+
+```bash
+# Training
+python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
+
+# Testing
+python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
+
+# Inference
+python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
+```
+
+### Model Optimization
+
+```bash
+# ONNX export and simplify
+python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
+python -m onnxsim model.onnx model_sim.onnx
+
+# TensorRT conversion
+trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
+
+# Benchmark
+trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
+```
--- a/engineering-team/senior-data-engineer/SKILL.md
+++ b/engineering-team/senior-data-engineer/SKILL.md
@@ -86,627 +86,7 @@ python scripts/etl_performance_optimizer.py analyze \
 ---

 ## Workflows
-
-### Workflow 1: Building a Batch ETL Pipeline
-
-**Scenario:** Extract data from PostgreSQL, transform with dbt, load to Snowflake.
-
-#### Step 1: Define Source Schema
-
-```sql
-- Document source tables
-SELECT
-    table_name,
-    column_name,
-    data_type,
-    is_nullable
-FROM information_schema.columns
-WHERE table_schema = 'source_schema'
-ORDER BY table_name, ordinal_position;
-```
-
-#### Step 2: Generate Extraction Config
-
-```bash
-python scripts/pipeline_orchestrator.py generate \
-  --type airflow \
-  --source postgres \
-  --tables orders,customers,products \
-  --mode incremental \
-  --watermark updated_at \
-  --output dags/extract_source.py
-```
-
-#### Step 3: Create dbt Models
-
-```sql
-- models/staging/stg_orders.sql
-WITH source AS (
-    SELECT * FROM {{ source('postgres', 'orders') }}
-),
-
-renamed AS (
-    SELECT
-        order_id,
-        customer_id,
-        order_date,
-        total_amount,
-        status,
-        _extracted_at
-    FROM source
-    WHERE order_date >= DATEADD(day, -3, CURRENT_DATE)
-)
-
-SELECT * FROM renamed
-```
-
-```sql
-- models/marts/fct_orders.sql
-{{
-    config(
-        materialized='incremental',
-        unique_key='order_id',
-        cluster_by=['order_date']
-    )
-}}
-
-SELECT
-    o.order_id,
-    o.customer_id,
-    c.customer_segment,
-    o.order_date,
-    o.total_amount,
-    o.status
-FROM {{ ref('stg_orders') }} o
-LEFT JOIN {{ ref('dim_customers') }} c
-    ON o.customer_id = c.customer_id
-
-{% if is_incremental() %}
-WHERE o._extracted_at > (SELECT MAX(_extracted_at) FROM {{ this }})
-{% endif %}
-```
-
-#### Step 4: Configure Data Quality Tests
-
-```yaml
-# models/marts/schema.yml
-version: 2
-
-models:
-  - name: "fct-orders"
-    description: "Order fact table"
-    columns:
-      - name: "order-id"
-        tests:
-          - unique
-          - not_null
-      - name: "total-amount"
-        tests:
-          - not_null
-          - dbt_utils.accepted_range:
-              min_value: 0
-              max_value: 1000000
-      - name: "order-date"
-        tests:
-          - not_null
-          - dbt_utils.recency:
-              datepart: day
-              field: order_date
-              interval: 1
-```
-
-#### Step 5: Create Airflow DAG
-
-```python
-# dags/daily_etl.py
-from airflow import DAG
-from airflow.providers.postgres.operators.postgres import PostgresOperator
-from airflow.operators.bash import BashOperator
-from airflow.utils.dates import days_ago
-from datetime import timedelta
-
-default_args = {
-    'owner': 'data-team',
-    'depends_on_past': False,
-    'email_on_failure': True,
-    'email': ['data-alerts@company.com'],
-    'retries': 2,
-    'retry_delay': timedelta(minutes=5),
-}
-
-with DAG(
-    'daily_etl_pipeline',
-    default_args=default_args,
-    description='Daily ETL from PostgreSQL to Snowflake',
-    schedule_interval='0 5 * * *',
-    start_date=days_ago(1),
-    catchup=False,
-    tags=['etl', 'daily'],
-) as dag:
-
-    extract = BashOperator(
-        task_id='extract_source_data',
-        bash_command='python /opt/airflow/scripts/extract.py --date {{ ds }}',
-    )
-
-    transform = BashOperator(
-        task_id='run_dbt_models',
-        bash_command='cd /opt/airflow/dbt && dbt run --select marts.*',
-    )
-
-    test = BashOperator(
-        task_id='run_dbt_tests',
-        bash_command='cd /opt/airflow/dbt && dbt test --select marts.*',
-    )
-
-    notify = BashOperator(
-        task_id='send_notification',
-        bash_command='python /opt/airflow/scripts/notify.py --status success',
-        trigger_rule='all_success',
-    )
-
-    extract >> transform >> test >> notify
-```
-
-#### Step 6: Validate Pipeline
-
-```bash
-# Test locally
-dbt run --select stg_orders fct_orders
-dbt test --select fct_orders
-
-# Validate data quality
-python scripts/data_quality_validator.py validate \
-  --table fct_orders \
-  --checks all \
-  --output reports/quality_report.json
-```
-
---
-
-### Workflow 2: Implementing Real-Time Streaming
-
-**Scenario:** Stream events from Kafka, process with Flink/Spark Streaming, sink to data lake.
-
-#### Step 1: Define Event Schema
-
-```json
-{
-  "$schema": "http://json-schema.org/draft-07/schema#",
-  "title": "UserEvent",
-  "type": "object",
-  "required": ["event_id", "user_id", "event_type", "timestamp"],
-  "properties": {
-    "event_id": {"type": "string", "format": "uuid"},
-    "user_id": {"type": "string"},
-    "event_type": {"type": "string", "enum": ["page_view", "click", "purchase"]},
-    "timestamp": {"type": "string", "format": "date-time"},
-    "properties": {"type": "object"}
-  }
-}
-```
-
-#### Step 2: Create Kafka Topic
-
-```bash
-# Create topic with appropriate partitions
-kafka-topics.sh --create \
-  --bootstrap-server localhost:9092 \
-  --topic user-events \
-  --partitions 12 \
-  --replication-factor 3 \
-  --config retention.ms=604800000 \
-  --config cleanup.policy=delete
-
-# Verify topic
-kafka-topics.sh --describe \
-  --bootstrap-server localhost:9092 \
-  --topic user-events
-```
-
-#### Step 3: Implement Spark Streaming Job
-
-```python
-# streaming/user_events_processor.py
-from pyspark.sql import SparkSession
-from pyspark.sql.functions import (
-    from_json, col, window, count, avg,
-    to_timestamp, current_timestamp
-)
-from pyspark.sql.types import (
-    StructType, StructField, StringType,
-    TimestampType, MapType
-)
-
-# Initialize Spark
-spark = SparkSession.builder \
-    .appName("UserEventsProcessor") \
-    .config("spark.sql.streaming.checkpointLocation", "/checkpoints/user-events") \
-    .config("spark.sql.shuffle.partitions", "12") \
-    .getOrCreate()
-
-# Define schema
-event_schema = StructType([
-    StructField("event_id", StringType(), False),
-    StructField("user_id", StringType(), False),
-    StructField("event_type", StringType(), False),
-    StructField("timestamp", StringType(), False),
-    StructField("properties", MapType(StringType(), StringType()), True)
-])
-
-# Read from Kafka
-events_df = spark.readStream \
-    .format("kafka") \
-    .option("kafka.bootstrap.servers", "localhost:9092") \
-    .option("subscribe", "user-events") \
-    .option("startingOffsets", "latest") \
-    .option("failOnDataLoss", "false") \
-    .load()
-
-# Parse JSON
-parsed_df = events_df \
-    .select(from_json(col("value").cast("string"), event_schema).alias("data")) \
-    .select("data.*") \
-    .withColumn("event_timestamp", to_timestamp(col("timestamp")))
-
-# Windowed aggregation
-aggregated_df = parsed_df \
-    .withWatermark("event_timestamp", "10 minutes") \
-    .groupBy(
-        window(col("event_timestamp"), "5 minutes"),
-        col("event_type")
-    ) \
-    .agg(
-        count("*").alias("event_count"),
-        approx_count_distinct("user_id").alias("unique_users")
-    )
-
-# Write to Delta Lake
-query = aggregated_df.writeStream \
-    .format("delta") \
-    .outputMode("append") \
-    .option("checkpointLocation", "/checkpoints/user-events-aggregated") \
-    .option("path", "/data/lake/user_events_aggregated") \
-    .trigger(processingTime="1 minute") \
-    .start()
-
-query.awaitTermination()
-```
-
-#### Step 4: Handle Late Data and Errors
-
-```python
-# Dead letter queue for failed records
-from pyspark.sql.functions import current_timestamp, lit
-
-def process_with_error_handling(batch_df, batch_id):
-    try:
-        # Attempt processing
-        valid_df = batch_df.filter(col("event_id").isNotNull())
-        invalid_df = batch_df.filter(col("event_id").isNull())
-
-        # Write valid records
-        valid_df.write \
-            .format("delta") \
-            .mode("append") \
-            .save("/data/lake/user_events")
-
-        # Write invalid to DLQ
-        if invalid_df.count() > 0:
-            invalid_df \
-                .withColumn("error_timestamp", current_timestamp()) \
-                .withColumn("error_reason", lit("missing_event_id")) \
-                .write \
-                .format("delta") \
-                .mode("append") \
-                .save("/data/lake/dlq/user_events")
-
-    except Exception as e:
-        # Log error, alert, continue
-        logger.error(f"Batch {batch_id} failed: {e}")
-        raise
-
-# Use foreachBatch for custom processing
-query = parsed_df.writeStream \
-    .foreachBatch(process_with_error_handling) \
-    .option("checkpointLocation", "/checkpoints/user-events") \
-    .start()
-```
-
-#### Step 5: Monitor Stream Health
-
-```python
-# monitoring/stream_metrics.py
-from prometheus_client import Gauge, Counter, start_http_server
-
-# Define metrics
-RECORDS_PROCESSED = Counter(
-    'stream_records_processed_total',
-    'Total records processed',
-    ['stream_name', 'status']
-)
-
-PROCESSING_LAG = Gauge(
-    'stream_processing_lag_seconds',
-    'Current processing lag',
-    ['stream_name']
-)
-
-BATCH_DURATION = Gauge(
-    'stream_batch_duration_seconds',
-    'Last batch processing duration',
-    ['stream_name']
-)
-
-def emit_metrics(query):
-    """Emit Prometheus metrics from streaming query."""
-    progress = query.lastProgress
-    if progress:
-        RECORDS_PROCESSED.labels(
-            stream_name='user-events',
-            status='success'
-        ).inc(progress['numInputRows'])
-
-        if progress['sources']:
-            # Calculate lag from latest offset
-            for source in progress['sources']:
-                end_offset = source.get('endOffset', {})
-                # Parse Kafka offsets and calculate lag
-```
-
---
-
-### Workflow 3: Data Quality Framework Setup
-
-**Scenario:** Implement comprehensive data quality monitoring with Great Expectations.
-
-#### Step 1: Initialize Great Expectations
-
-```bash
-# Install and initialize
-pip install great_expectations
-
-great_expectations init
-
-# Connect to data source
-great_expectations datasource new
-```
-
-#### Step 2: Create Expectation Suite
-
-```python
-# expectations/orders_suite.py
-import great_expectations as gx
-
-context = gx.get_context()
-
-# Create expectation suite
-suite = context.add_expectation_suite("orders_quality_suite")
-
-# Add expectations
-validator = context.get_validator(
-    batch_request={
-        "datasource_name": "warehouse",
-        "data_asset_name": "orders",
-    },
-    expectation_suite_name="orders_quality_suite"
-)
-
-# Schema expectations
-validator.expect_table_columns_to_match_ordered_list(
-    column_list=[
-        "order_id", "customer_id", "order_date",
-        "total_amount", "status", "created_at"
-    ]
-)
-
-# Completeness expectations
-validator.expect_column_values_to_not_be_null("order_id")
-validator.expect_column_values_to_not_be_null("customer_id")
-validator.expect_column_values_to_not_be_null("order_date")
-
-# Uniqueness expectations
-validator.expect_column_values_to_be_unique("order_id")
-
-# Range expectations
-validator.expect_column_values_to_be_between(
-    "total_amount",
-    min_value=0,
-    max_value=1000000
-)
-
-# Categorical expectations
-validator.expect_column_values_to_be_in_set(
-    "status",
-    ["pending", "confirmed", "shipped", "delivered", "cancelled"]
-)
-
-# Freshness expectation
-validator.expect_column_max_to_be_between(
-    "order_date",
-    min_value={"$PARAMETER": "now - timedelta(days=1)"},
-    max_value={"$PARAMETER": "now"}
-)
-
-# Referential integrity
-validator.expect_column_values_to_be_in_set(
-    "customer_id",
-    value_set={"$PARAMETER": "valid_customer_ids"}
-)
-
-validator.save_expectation_suite(discard_failed_expectations=False)
-```
-
-#### Step 3: Create Data Quality Checks with dbt
-
-```yaml
-# models/marts/schema.yml
-version: 2
-
-models:
-  - name: "fct-orders"
-    description: "Order fact table with data quality checks"
-
-    tests:
-      # Row count check
-      - dbt_utils.equal_rowcount:
-          compare_model: ref('stg_orders')
-
-      # Freshness check
-      - dbt_utils.recency:
-          datepart: hour
-          field: created_at
-          interval: 24
-
-    columns:
-      - name: "order-id"
-        description: "Unique order identifier"
-        tests:
-          - unique
-          - not_null
-          - relationships:
-              to: ref('dim_orders')
-              field: order_id
-
-      - name: "total-amount"
-        tests:
-          - not_null
-          - dbt_utils.accepted_range:
-              min_value: 0
-              max_value: 1000000
-              inclusive: true
-          - dbt_expectations.expect_column_values_to_be_between:
-              min_value: 0
-              row_condition: "status != 'cancelled'"
-
-      - name: "customer-id"
-        tests:
-          - not_null
-          - relationships:
-              to: ref('dim_customers')
-              field: customer_id
-              severity: warn
-```
-
-#### Step 4: Implement Data Contracts
-
-```yaml
-# contracts/orders_contract.yaml
-contract:
-  name: "orders-data-contract"
-  version: "1.0.0"
-  owner: data-team@company.com
-
-schema:
-  type: object
-  properties:
-    order_id:
-      type: string
-      format: uuid
-      description: "Unique order identifier"
-    customer_id:
-      type: string
-      not_null: true
-    order_date:
-      type: date
-      not_null: true
-    total_amount:
-      type: decimal
-      precision: 10
-      scale: 2
-      minimum: 0
-    status:
-      type: string
-      enum: ["pending", "confirmed", "shipped", "delivered", "cancelled"]
-
-sla:
-  freshness:
-    max_delay_hours: 1
-  completeness:
-    min_percentage: 99.9
-  accuracy:
-    duplicate_tolerance: 0.01
-
-consumers:
-  - name: "analytics-team"
-    usage: "Daily reporting dashboards"
-  - name: "ml-team"
-    usage: "Churn prediction model"
-```
-
-#### Step 5: Set Up Quality Monitoring Dashboard
-
-```python
-# monitoring/quality_dashboard.py
-from datetime import datetime, timedelta
-import pandas as pd
-
-def generate_quality_report(connection, table_name: "str-dict"
-    """Generate comprehensive data quality report."""
-
-    report = {
-        "table": table_name,
-        "timestamp": datetime.now().isoformat(),
-        "checks": {}
-    }
-
-    # Row count check
-    row_count = connection.execute(
-        f"SELECT COUNT(*) FROM {table_name}"
-    ).fetchone()[0]
-    report["checks"]["row_count"] = {
-        "value": row_count,
-        "status": "pass" if row_count > 0 else "fail"
-    }
-
-    # Freshness check
-    max_date = connection.execute(
-        f"SELECT MAX(created_at) FROM {table_name}"
-    ).fetchone()[0]
-    hours_old = (datetime.now() - max_date).total_seconds() / 3600
-    report["checks"]["freshness"] = {
-        "max_timestamp": max_date.isoformat(),
-        "hours_old": round(hours_old, 2),
-        "status": "pass" if hours_old < 24 else "fail"
-    }
-
-    # Null rate check
-    null_query = f"""
-    SELECT
-        SUM(CASE WHEN order_id IS NULL THEN 1 ELSE 0 END) as null_order_id,
-        SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) as null_customer_id,
-        COUNT(*) as total
-    FROM {table_name}
-    """
-    null_result = connection.execute(null_query).fetchone()
-    report["checks"]["null_rates"] = {
-        "order_id": null_result[0] / null_result[2] if null_result[2] > 0 else 0,
-        "customer_id": null_result[1] / null_result[2] if null_result[2] > 0 else 0,
-        "status": "pass" if null_result[0] == 0 and null_result[1] == 0 else "fail"
-    }
-
-    # Duplicate check
-    dup_query = f"""
-    SELECT COUNT(*) - COUNT(DISTINCT order_id) as duplicates
-    FROM {table_name}
-    """
-    duplicates = connection.execute(dup_query).fetchone()[0]
-    report["checks"]["duplicates"] = {
-        "count": duplicates,
-        "status": "pass" if duplicates == 0 else "fail"
-    }
-
-    # Overall status
-    all_passed = all(
-        check["status"] == "pass"
-        for check in report["checks"].values()
-    )
-    report["overall_status"] = "pass" if all_passed else "fail"
-
-    return report
-```
-
---
+→ See references/workflows.md for details

 ## Architecture Decision Framework

@@ -810,183 +190,5 @@ See `references/dataops_best_practices.md` for:
 ---

 ## Troubleshooting
+→ See references/troubleshooting.md for details

-### Pipeline Failures
-
-**Symptom:** Airflow DAG fails with timeout
-```
-Task exceeded max execution time
-```
-
-**Solution:**
-1. Check resource allocation
-2. Profile slow operations
-3. Add incremental processing
-```python
-# Increase timeout
-default_args = {
-    'execution_timeout': timedelta(hours=2),
-}
-
-# Or use incremental loads
-WHERE updated_at > '{{ prev_ds }}'
-```
-
---
-
-**Symptom:** Spark job OOM
-```
-java.lang.OutOfMemoryError: Java heap space
-```
-
-**Solution:**
-1. Increase executor memory
-2. Reduce partition size
-3. Use disk spill
-```python
-spark.conf.set("spark.executor.memory", "8g")
-spark.conf.set("spark.sql.shuffle.partitions", "200")
-spark.conf.set("spark.memory.fraction", "0.8")
-```
-
---
-
-**Symptom:** Kafka consumer lag increasing
-```
-Consumer lag: 1000000 messages
-```
-
-**Solution:**
-1. Increase consumer parallelism
-2. Optimize processing logic
-3. Scale consumer group
-```bash
-# Add more partitions
-kafka-topics.sh --alter \
-  --bootstrap-server localhost:9092 \
-  --topic user-events \
-  --partitions 24
-```
-
---
-
-### Data Quality Issues
-
-**Symptom:** Duplicate records appearing
-```
-Expected unique, found 150 duplicates
-```
-
-**Solution:**
-1. Add deduplication logic
-2. Use merge/upsert operations
-```sql
-- dbt incremental with dedup
-{{
-    config(
-        materialized='incremental',
-        unique_key='order_id'
-    )
-}}
-
-SELECT * FROM (
-    SELECT
-        *,
-        ROW_NUMBER() OVER (
-            PARTITION BY order_id
-            ORDER BY updated_at DESC
-        ) as rn
-    FROM {{ source('raw', 'orders') }}
-) WHERE rn = 1
-```
-
---
-
-**Symptom:** Stale data in tables
-```
-Last update: 3 days ago
-```
-
-**Solution:**
-1. Check upstream pipeline status
-2. Verify source availability
-3. Add freshness monitoring
-```yaml
-# dbt freshness check
-sources:
-  - name: "raw"
-    freshness:
-      warn_after: {count: 12, period: hour}
-      error_after: {count: 24, period: hour}
-    loaded_at_field: _loaded_at
-```
-
---
-
-**Symptom:** Schema drift detected
-```
-Column 'new_field' not in expected schema
-```
-
-**Solution:**
-1. Update data contract
-2. Modify transformations
-3. Communicate with producers
-```python
-# Handle schema evolution
-df = spark.read.format("delta") \
-    .option("mergeSchema", "true") \
-    .load("/data/orders")
-```
-
---
-
-### Performance Issues
-
-**Symptom:** Query takes hours
-```
-Query runtime: 4 hours (expected: 30 minutes)
-```
-
-**Solution:**
-1. Check query plan
-2. Add proper partitioning
-3. Optimize joins
-```sql
-- Before: Full table scan
-SELECT * FROM orders WHERE order_date = '2024-01-15';
-
-- After: Partition pruning
-- Table partitioned by order_date
-SELECT * FROM orders WHERE order_date = '2024-01-15';
-
-- Add clustering for frequent filters
-ALTER TABLE orders CLUSTER BY (customer_id);
-```
-
---
-
-**Symptom:** dbt model takes too long
-```
-Model fct_orders completed in 45 minutes
-```
-
-**Solution:**
-1. Use incremental materialization
-2. Reduce upstream dependencies
-3. Pre-aggregate where possible
-```sql
-- Convert to incremental
-{{
-    config(
-        materialized='incremental',
-        unique_key='order_id',
-        on_schema_change='sync_all_columns'
-    )
-}}
-
-SELECT * FROM {{ ref('stg_orders') }}
-{% if is_incremental() %}
-WHERE _loaded_at > (SELECT MAX(_loaded_at) FROM {{ this }})
-{% endif %}
-```
--- a/engineering-team/senior-data-engineer/references/troubleshooting.md
+++ b/engineering-team/senior-data-engineer/references/troubleshooting.md
@@ -0,0 +1,183 @@
+# senior-data-engineer reference
+
+## Troubleshooting
+
+### Pipeline Failures
+
+**Symptom:** Airflow DAG fails with timeout
+```
+Task exceeded max execution time
+```
+
+**Solution:**
+1. Check resource allocation
+2. Profile slow operations
+3. Add incremental processing
+```python
+# Increase timeout
+default_args = {
+    'execution_timeout': timedelta(hours=2),
+}
+
+# Or use incremental loads
+WHERE updated_at > '{{ prev_ds }}'
+```
+
+---
+
+**Symptom:** Spark job OOM
+```
+java.lang.OutOfMemoryError: Java heap space
+```
+
+**Solution:**
+1. Increase executor memory
+2. Reduce partition size
+3. Use disk spill
+```python
+spark.conf.set("spark.executor.memory", "8g")
+spark.conf.set("spark.sql.shuffle.partitions", "200")
+spark.conf.set("spark.memory.fraction", "0.8")
+```
+
+---
+
+**Symptom:** Kafka consumer lag increasing
+```
+Consumer lag: 1000000 messages
+```
+
+**Solution:**
+1. Increase consumer parallelism
+2. Optimize processing logic
+3. Scale consumer group
+```bash
+# Add more partitions
+kafka-topics.sh --alter \
+  --bootstrap-server localhost:9092 \
+  --topic user-events \
+  --partitions 24
+```
+
+---
+
+### Data Quality Issues
+
+**Symptom:** Duplicate records appearing
+```
+Expected unique, found 150 duplicates
+```
+
+**Solution:**
+1. Add deduplication logic
+2. Use merge/upsert operations
+```sql
+-- dbt incremental with dedup
+{{
+    config(
+        materialized='incremental',
+        unique_key='order_id'
+    )
+}}
+
+SELECT * FROM (
+    SELECT
+        *,
+        ROW_NUMBER() OVER (
+            PARTITION BY order_id
+            ORDER BY updated_at DESC
+        ) as rn
+    FROM {{ source('raw', 'orders') }}
+) WHERE rn = 1
+```
+
+---
+
+**Symptom:** Stale data in tables
+```
+Last update: 3 days ago
+```
+
+**Solution:**
+1. Check upstream pipeline status
+2. Verify source availability
+3. Add freshness monitoring
+```yaml
+# dbt freshness check
+sources:
+  - name: "raw"
+    freshness:
+      warn_after: {count: 12, period: hour}
+      error_after: {count: 24, period: hour}
+    loaded_at_field: _loaded_at
+```
+
+---
+
+**Symptom:** Schema drift detected
+```
+Column 'new_field' not in expected schema
+```
+
+**Solution:**
+1. Update data contract
+2. Modify transformations
+3. Communicate with producers
+```python
+# Handle schema evolution
+df = spark.read.format("delta") \
+    .option("mergeSchema", "true") \
+    .load("/data/orders")
+```
+
+---
+
+### Performance Issues
+
+**Symptom:** Query takes hours
+```
+Query runtime: 4 hours (expected: 30 minutes)
+```
+
+**Solution:**
+1. Check query plan
+2. Add proper partitioning
+3. Optimize joins
+```sql
+-- Before: Full table scan
+SELECT * FROM orders WHERE order_date = '2024-01-15';
+
+-- After: Partition pruning
+-- Table partitioned by order_date
+SELECT * FROM orders WHERE order_date = '2024-01-15';
+
+-- Add clustering for frequent filters
+ALTER TABLE orders CLUSTER BY (customer_id);
+```
+
+---
+
+**Symptom:** dbt model takes too long
+```
+Model fct_orders completed in 45 minutes
+```
+
+**Solution:**
+1. Use incremental materialization
+2. Reduce upstream dependencies
+3. Pre-aggregate where possible
+```sql
+-- Convert to incremental
+{{
+    config(
+        materialized='incremental',
+        unique_key='order_id',
+        on_schema_change='sync_all_columns'
+    )
+}}
+
+SELECT * FROM {{ ref('stg_orders') }}
+{% if is_incremental() %}
+WHERE _loaded_at > (SELECT MAX(_loaded_at) FROM {{ this }})
+{% endif %}
+```
--- a/engineering-team/senior-data-engineer/references/workflows.md
+++ b/engineering-team/senior-data-engineer/references/workflows.md
@@ -0,0 +1,624 @@
+# senior-data-engineer reference
+
+## Workflows
+
+### Workflow 1: Building a Batch ETL Pipeline
+
+**Scenario:** Extract data from PostgreSQL, transform with dbt, load to Snowflake.
+
+#### Step 1: Define Source Schema
+
+```sql
+-- Document source tables
+SELECT
+    table_name,
+    column_name,
+    data_type,
+    is_nullable
+FROM information_schema.columns
+WHERE table_schema = 'source_schema'
+ORDER BY table_name, ordinal_position;
+```
+
+#### Step 2: Generate Extraction Config
+
+```bash
+python scripts/pipeline_orchestrator.py generate \
+  --type airflow \
+  --source postgres \
+  --tables orders,customers,products \
+  --mode incremental \
+  --watermark updated_at \
+  --output dags/extract_source.py
+```
+
+#### Step 3: Create dbt Models
+
+```sql
+-- models/staging/stg_orders.sql
+WITH source AS (
+    SELECT * FROM {{ source('postgres', 'orders') }}
+),
+
+renamed AS (
+    SELECT
+        order_id,
+        customer_id,
+        order_date,
+        total_amount,
+        status,
+        _extracted_at
+    FROM source
+    WHERE order_date >= DATEADD(day, -3, CURRENT_DATE)
+)
+
+SELECT * FROM renamed
+```
+
+```sql
+-- models/marts/fct_orders.sql
+{{
+    config(
+        materialized='incremental',
+        unique_key='order_id',
+        cluster_by=['order_date']
+    )
+}}
+
+SELECT
+    o.order_id,
+    o.customer_id,
+    c.customer_segment,
+    o.order_date,
+    o.total_amount,
+    o.status
+FROM {{ ref('stg_orders') }} o
+LEFT JOIN {{ ref('dim_customers') }} c
+    ON o.customer_id = c.customer_id
+
+{% if is_incremental() %}
+WHERE o._extracted_at > (SELECT MAX(_extracted_at) FROM {{ this }})
+{% endif %}
+```
+
+#### Step 4: Configure Data Quality Tests
+
+```yaml
+# models/marts/schema.yml
+version: 2
+
+models:
+  - name: "fct-orders"
+    description: "Order fact table"
+    columns:
+      - name: "order-id"
+        tests:
+          - unique
+          - not_null
+      - name: "total-amount"
+        tests:
+          - not_null
+          - dbt_utils.accepted_range:
+              min_value: 0
+              max_value: 1000000
+      - name: "order-date"
+        tests:
+          - not_null
+          - dbt_utils.recency:
+              datepart: day
+              field: order_date
+              interval: 1
+```
+
+#### Step 5: Create Airflow DAG
+
+```python
+# dags/daily_etl.py
+from airflow import DAG
+from airflow.providers.postgres.operators.postgres import PostgresOperator
+from airflow.operators.bash import BashOperator
+from airflow.utils.dates import days_ago
+from datetime import timedelta
+
+default_args = {
+    'owner': 'data-team',
+    'depends_on_past': False,
+    'email_on_failure': True,
+    'email': ['data-alerts@company.com'],
+    'retries': 2,
+    'retry_delay': timedelta(minutes=5),
+}
+
+with DAG(
+    'daily_etl_pipeline',
+    default_args=default_args,
+    description='Daily ETL from PostgreSQL to Snowflake',
+    schedule_interval='0 5 * * *',
+    start_date=days_ago(1),
+    catchup=False,
+    tags=['etl', 'daily'],
+) as dag:
+
+    extract = BashOperator(
+        task_id='extract_source_data',
+        bash_command='python /opt/airflow/scripts/extract.py --date {{ ds }}',
+    )
+
+    transform = BashOperator(
+        task_id='run_dbt_models',
+        bash_command='cd /opt/airflow/dbt && dbt run --select marts.*',
+    )
+
+    test = BashOperator(
+        task_id='run_dbt_tests',
+        bash_command='cd /opt/airflow/dbt && dbt test --select marts.*',
+    )
+
+    notify = BashOperator(
+        task_id='send_notification',
+        bash_command='python /opt/airflow/scripts/notify.py --status success',
+        trigger_rule='all_success',
+    )
+
+    extract >> transform >> test >> notify
+```
+
+#### Step 6: Validate Pipeline
+
+```bash
+# Test locally
+dbt run --select stg_orders fct_orders
+dbt test --select fct_orders
+
+# Validate data quality
+python scripts/data_quality_validator.py validate \
+  --table fct_orders \
+  --checks all \
+  --output reports/quality_report.json
+```
+
+---
+
+### Workflow 2: Implementing Real-Time Streaming
+
+**Scenario:** Stream events from Kafka, process with Flink/Spark Streaming, sink to data lake.
+
+#### Step 1: Define Event Schema
+
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "UserEvent",
+  "type": "object",
+  "required": ["event_id", "user_id", "event_type", "timestamp"],
+  "properties": {
+    "event_id": {"type": "string", "format": "uuid"},
+    "user_id": {"type": "string"},
+    "event_type": {"type": "string", "enum": ["page_view", "click", "purchase"]},
+    "timestamp": {"type": "string", "format": "date-time"},
+    "properties": {"type": "object"}
+  }
+}
+```
+
+#### Step 2: Create Kafka Topic
+
+```bash
+# Create topic with appropriate partitions
+kafka-topics.sh --create \
+  --bootstrap-server localhost:9092 \
+  --topic user-events \
+  --partitions 12 \
+  --replication-factor 3 \
+  --config retention.ms=604800000 \
+  --config cleanup.policy=delete
+
+# Verify topic
+kafka-topics.sh --describe \
+  --bootstrap-server localhost:9092 \
+  --topic user-events
+```
+
+#### Step 3: Implement Spark Streaming Job
+
+```python
+# streaming/user_events_processor.py
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import (
+    from_json, col, window, count, avg,
+    to_timestamp, current_timestamp
+)
+from pyspark.sql.types import (
+    StructType, StructField, StringType,
+    TimestampType, MapType
+)
+
+# Initialize Spark
+spark = SparkSession.builder \
+    .appName("UserEventsProcessor") \
+    .config("spark.sql.streaming.checkpointLocation", "/checkpoints/user-events") \
+    .config("spark.sql.shuffle.partitions", "12") \
+    .getOrCreate()
+
+# Define schema
+event_schema = StructType([
+    StructField("event_id", StringType(), False),
+    StructField("user_id", StringType(), False),
+    StructField("event_type", StringType(), False),
+    StructField("timestamp", StringType(), False),
+    StructField("properties", MapType(StringType(), StringType()), True)
+])
+
+# Read from Kafka
+events_df = spark.readStream \
+    .format("kafka") \
+    .option("kafka.bootstrap.servers", "localhost:9092") \
+    .option("subscribe", "user-events") \
+    .option("startingOffsets", "latest") \
+    .option("failOnDataLoss", "false") \
+    .load()
+
+# Parse JSON
+parsed_df = events_df \
+    .select(from_json(col("value").cast("string"), event_schema).alias("data")) \
+    .select("data.*") \
+    .withColumn("event_timestamp", to_timestamp(col("timestamp")))
+
+# Windowed aggregation
+aggregated_df = parsed_df \
+    .withWatermark("event_timestamp", "10 minutes") \
+    .groupBy(
+        window(col("event_timestamp"), "5 minutes"),
+        col("event_type")
+    ) \
+    .agg(
+        count("*").alias("event_count"),
+        approx_count_distinct("user_id").alias("unique_users")
+    )
+
+# Write to Delta Lake
+query = aggregated_df.writeStream \
+    .format("delta") \
+    .outputMode("append") \
+    .option("checkpointLocation", "/checkpoints/user-events-aggregated") \
+    .option("path", "/data/lake/user_events_aggregated") \
+    .trigger(processingTime="1 minute") \
+    .start()
+
+query.awaitTermination()
+```
+
+#### Step 4: Handle Late Data and Errors
+
+```python
+# Dead letter queue for failed records
+from pyspark.sql.functions import current_timestamp, lit
+
+def process_with_error_handling(batch_df, batch_id):
+    try:
+        # Attempt processing
+        valid_df = batch_df.filter(col("event_id").isNotNull())
+        invalid_df = batch_df.filter(col("event_id").isNull())
+
+        # Write valid records
+        valid_df.write \
+            .format("delta") \
+            .mode("append") \
+            .save("/data/lake/user_events")
+
+        # Write invalid to DLQ
+        if invalid_df.count() > 0:
+            invalid_df \
+                .withColumn("error_timestamp", current_timestamp()) \
+                .withColumn("error_reason", lit("missing_event_id")) \
+                .write \
+                .format("delta") \
+                .mode("append") \
+                .save("/data/lake/dlq/user_events")
+
+    except Exception as e:
+        # Log error, alert, continue
+        logger.error(f"Batch {batch_id} failed: {e}")
+        raise
+
+# Use foreachBatch for custom processing
+query = parsed_df.writeStream \
+    .foreachBatch(process_with_error_handling) \
+    .option("checkpointLocation", "/checkpoints/user-events") \
+    .start()
+```
+
+#### Step 5: Monitor Stream Health
+
+```python
+# monitoring/stream_metrics.py
+from prometheus_client import Gauge, Counter, start_http_server
+
+# Define metrics
+RECORDS_PROCESSED = Counter(
+    'stream_records_processed_total',
+    'Total records processed',
+    ['stream_name', 'status']
+)
+
+PROCESSING_LAG = Gauge(
+    'stream_processing_lag_seconds',
+    'Current processing lag',
+    ['stream_name']
+)
+
+BATCH_DURATION = Gauge(
+    'stream_batch_duration_seconds',
+    'Last batch processing duration',
+    ['stream_name']
+)
+
+def emit_metrics(query):
+    """Emit Prometheus metrics from streaming query."""
+    progress = query.lastProgress
+    if progress:
+        RECORDS_PROCESSED.labels(
+            stream_name='user-events',
+            status='success'
+        ).inc(progress['numInputRows'])
+
+        if progress['sources']:
+            # Calculate lag from latest offset
+            for source in progress['sources']:
+                end_offset = source.get('endOffset', {})
+                # Parse Kafka offsets and calculate lag
+```
+
+---
+
+### Workflow 3: Data Quality Framework Setup
+
+**Scenario:** Implement comprehensive data quality monitoring with Great Expectations.
+
+#### Step 1: Initialize Great Expectations
+
+```bash
+# Install and initialize
+pip install great_expectations
+
+great_expectations init
+
+# Connect to data source
+great_expectations datasource new
+```
+
+#### Step 2: Create Expectation Suite
+
+```python
+# expectations/orders_suite.py
+import great_expectations as gx
+
+context = gx.get_context()
+
+# Create expectation suite
+suite = context.add_expectation_suite("orders_quality_suite")
+
+# Add expectations
+validator = context.get_validator(
+    batch_request={
+        "datasource_name": "warehouse",
+        "data_asset_name": "orders",
+    },
+    expectation_suite_name="orders_quality_suite"
+)
+
+# Schema expectations
+validator.expect_table_columns_to_match_ordered_list(
+    column_list=[
+        "order_id", "customer_id", "order_date",
+        "total_amount", "status", "created_at"
+    ]
+)
+
+# Completeness expectations
+validator.expect_column_values_to_not_be_null("order_id")
+validator.expect_column_values_to_not_be_null("customer_id")
+validator.expect_column_values_to_not_be_null("order_date")
+
+# Uniqueness expectations
+validator.expect_column_values_to_be_unique("order_id")
+
+# Range expectations
+validator.expect_column_values_to_be_between(
+    "total_amount",
+    min_value=0,
+    max_value=1000000
+)
+
+# Categorical expectations
+validator.expect_column_values_to_be_in_set(
+    "status",
+    ["pending", "confirmed", "shipped", "delivered", "cancelled"]
+)
+
+# Freshness expectation
+validator.expect_column_max_to_be_between(
+    "order_date",
+    min_value={"$PARAMETER": "now - timedelta(days=1)"},
+    max_value={"$PARAMETER": "now"}
+)
+
+# Referential integrity
+validator.expect_column_values_to_be_in_set(
+    "customer_id",
+    value_set={"$PARAMETER": "valid_customer_ids"}
+)
+
+validator.save_expectation_suite(discard_failed_expectations=False)
+```
+
+#### Step 3: Create Data Quality Checks with dbt
+
+```yaml
+# models/marts/schema.yml
+version: 2
+
+models:
+  - name: "fct-orders"
+    description: "Order fact table with data quality checks"
+
+    tests:
+      # Row count check
+      - dbt_utils.equal_rowcount:
+          compare_model: ref('stg_orders')
+
+      # Freshness check
+      - dbt_utils.recency:
+          datepart: hour
+          field: created_at
+          interval: 24
+
+    columns:
+      - name: "order-id"
+        description: "Unique order identifier"
+        tests:
+          - unique
+          - not_null
+          - relationships:
+              to: ref('dim_orders')
+              field: order_id
+
+      - name: "total-amount"
+        tests:
+          - not_null
+          - dbt_utils.accepted_range:
+              min_value: 0
+              max_value: 1000000
+              inclusive: true
+          - dbt_expectations.expect_column_values_to_be_between:
+              min_value: 0
+              row_condition: "status != 'cancelled'"
+
+      - name: "customer-id"
+        tests:
+          - not_null
+          - relationships:
+              to: ref('dim_customers')
+              field: customer_id
+              severity: warn
+```
+
+#### Step 4: Implement Data Contracts
+
+```yaml
+# contracts/orders_contract.yaml
+contract:
+  name: "orders-data-contract"
+  version: "1.0.0"
+  owner: data-team@company.com
+
+schema:
+  type: object
+  properties:
+    order_id:
+      type: string
+      format: uuid
+      description: "Unique order identifier"
+    customer_id:
+      type: string
+      not_null: true
+    order_date:
+      type: date
+      not_null: true
+    total_amount:
+      type: decimal
+      precision: 10
+      scale: 2
+      minimum: 0
+    status:
+      type: string
+      enum: ["pending", "confirmed", "shipped", "delivered", "cancelled"]
+
+sla:
+  freshness:
+    max_delay_hours: 1
+  completeness:
+    min_percentage: 99.9
+  accuracy:
+    duplicate_tolerance: 0.01
+
+consumers:
+  - name: "analytics-team"
+    usage: "Daily reporting dashboards"
+  - name: "ml-team"
+    usage: "Churn prediction model"
+```
+
+#### Step 5: Set Up Quality Monitoring Dashboard
+
+```python
+# monitoring/quality_dashboard.py
+from datetime import datetime, timedelta
+import pandas as pd
+
+def generate_quality_report(connection, table_name: "str-dict"
+    """Generate comprehensive data quality report."""
+
+    report = {
+        "table": table_name,
+        "timestamp": datetime.now().isoformat(),
+        "checks": {}
+    }
+
+    # Row count check
+    row_count = connection.execute(
+        f"SELECT COUNT(*) FROM {table_name}"
+    ).fetchone()[0]
+    report["checks"]["row_count"] = {
+        "value": row_count,
+        "status": "pass" if row_count > 0 else "fail"
+    }
+
+    # Freshness check
+    max_date = connection.execute(
+        f"SELECT MAX(created_at) FROM {table_name}"
+    ).fetchone()[0]
+    hours_old = (datetime.now() - max_date).total_seconds() / 3600
+    report["checks"]["freshness"] = {
+        "max_timestamp": max_date.isoformat(),
+        "hours_old": round(hours_old, 2),
+        "status": "pass" if hours_old < 24 else "fail"
+    }
+
+    # Null rate check
+    null_query = f"""
+    SELECT
+        SUM(CASE WHEN order_id IS NULL THEN 1 ELSE 0 END) as null_order_id,
+        SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) as null_customer_id,
+        COUNT(*) as total
+    FROM {table_name}
+    """
+    null_result = connection.execute(null_query).fetchone()
+    report["checks"]["null_rates"] = {
+        "order_id": null_result[0] / null_result[2] if null_result[2] > 0 else 0,
+        "customer_id": null_result[1] / null_result[2] if null_result[2] > 0 else 0,
+        "status": "pass" if null_result[0] == 0 and null_result[1] == 0 else "fail"
+    }
+
+    # Duplicate check
+    dup_query = f"""
+    SELECT COUNT(*) - COUNT(DISTINCT order_id) as duplicates
+    FROM {table_name}
+    """
+    duplicates = connection.execute(dup_query).fetchone()[0]
+    report["checks"]["duplicates"] = {
+        "count": duplicates,
+        "status": "pass" if duplicates == 0 else "fail"
+    }
+
+    # Overall status
+    all_passed = all(
+        check["status"] == "pass"
+        for check in report["checks"].values()
+    )
+    report["overall_status"] = "pass" if all_passed else "fail"
+
+    return report
+```
+
+---