Files
claude-skills-reference/engineering-team/senior-data-engineer/references
Alireza Rezvani 63335af90f fix(skill): rewrite senior-data-engineer with comprehensive data engineering content (#53) (#100)
Complete overhaul of senior-data-engineer skill (previously Grade F: 43/100):

SKILL.md (~550 lines):
- Added table of contents and trigger phrases
- 3 actionable workflows: Batch ETL Pipeline, Real-Time Streaming, Data Quality Framework
- Architecture decision framework (Batch vs Stream, Lambda vs Kappa)
- Tech stack overview with decision matrix
- Troubleshooting section with common issues and solutions

Reference Files (all rewritten from 81-line boilerplate):
- data_pipeline_architecture.md (~700 lines): Lambda/Kappa architectures,
  batch processing with Spark, stream processing with Kafka/Flink,
  exactly-once semantics, error handling strategies, orchestration patterns
- data_modeling_patterns.md (~650 lines): Dimensional modeling (Star/Snowflake/OBT),
  SCD Types 0-6 with SQL implementations, Data Vault (Hub/Satellite/Link),
  dbt best practices, partitioning and clustering strategies
- dataops_best_practices.md (~750 lines): Data testing (Great Expectations, dbt),
  data contracts with YAML definitions, CI/CD pipelines, observability
  with OpenLineage, incident response runbooks, cost optimization

Python Scripts (all rewritten from 101-line placeholders):
- pipeline_orchestrator.py (~600 lines): Generates Airflow DAGs, Prefect flows,
  and Dagster jobs with configurable ETL patterns
- data_quality_validator.py (~1640 lines): Schema validation, data profiling,
  Great Expectations suite generation, data contract validation, anomaly detection
- etl_performance_optimizer.py (~1680 lines): SQL query analysis, Spark job
  optimization, partition strategy recommendations, cost estimation for
  BigQuery/Snowflake/Redshift/Databricks

Resolves #53

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 08:12:42 +01:00
..