fix(skill): rewrite senior-data-engineer with comprehensive data engineering content (#53) (#100)
Complete overhaul of senior-data-engineer skill (previously Grade F: 43/100):
SKILL.md (~550 lines):
- Added table of contents and trigger phrases
- 3 actionable workflows: Batch ETL Pipeline, Real-Time Streaming, Data Quality Framework
- Architecture decision framework (Batch vs Stream, Lambda vs Kappa)
- Tech stack overview with decision matrix
- Troubleshooting section with common issues and solutions
Reference Files (all rewritten from 81-line boilerplate):
- data_pipeline_architecture.md (~700 lines): Lambda/Kappa architectures,
batch processing with Spark, stream processing with Kafka/Flink,
exactly-once semantics, error handling strategies, orchestration patterns
- data_modeling_patterns.md (~650 lines): Dimensional modeling (Star/Snowflake/OBT),
SCD Types 0-6 with SQL implementations, Data Vault (Hub/Satellite/Link),
dbt best practices, partitioning and clustering strategies
- dataops_best_practices.md (~750 lines): Data testing (Great Expectations, dbt),
data contracts with YAML definitions, CI/CD pipelines, observability
with OpenLineage, incident response runbooks, cost optimization
Python Scripts (all rewritten from 101-line placeholders):
- pipeline_orchestrator.py (~600 lines): Generates Airflow DAGs, Prefect flows,
and Dagster jobs with configurable ETL patterns
- data_quality_validator.py (~1640 lines): Schema validation, data profiling,
Great Expectations suite generation, data contract validation, anomaly detection
- etl_performance_optimizer.py (~1680 lines): SQL query analysis, Spark job
optimization, partition strategy recommendations, cost estimation for
BigQuery/Snowflake/Redshift/Databricks
Resolves #53
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>