- mkdocs.yml: Material theme with dark/light mode, search, tabs, sitemap - scripts/generate-docs.py: auto-generates docs from all SKILL.md files - docs/index.md: landing page with domain overview and quick install - docs/getting-started.md: installation guide for Claude Code, Codex, OpenClaw - docs/skills/: 170 skill pages + 9 domain index pages - .github/workflows/static.yml: MkDocs build + GitHub Pages deploy - .gitignore: exclude site/ build output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
234 lines
5.4 KiB
Markdown
234 lines
5.4 KiB
Markdown
---
|
|
title: "Senior Data Scientist"
|
|
description: "Senior Data Scientist - Claude Code skill from the Engineering - Core domain."
|
|
---
|
|
|
|
# Senior Data Scientist
|
|
|
|
**Domain:** Engineering - Core | **Skill:** `senior-data-scientist` | **Source:** [`engineering-team/senior-data-scientist/SKILL.md`](https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/senior-data-scientist/SKILL.md)
|
|
|
|
---
|
|
|
|
|
|
# Senior Data Scientist
|
|
|
|
World-class senior data scientist skill for production-grade AI/ML/Data systems.
|
|
|
|
## Quick Start
|
|
|
|
### Main Capabilities
|
|
|
|
```bash
|
|
# Core Tool 1
|
|
python scripts/experiment_designer.py --input data/ --output results/
|
|
|
|
# Core Tool 2
|
|
python scripts/feature_engineering_pipeline.py --target project/ --analyze
|
|
|
|
# Core Tool 3
|
|
python scripts/model_evaluation_suite.py --config config.yaml --deploy
|
|
```
|
|
|
|
## Core Expertise
|
|
|
|
This skill covers world-class capabilities in:
|
|
|
|
- Advanced production patterns and architectures
|
|
- Scalable system design and implementation
|
|
- Performance optimization at scale
|
|
- MLOps and DataOps best practices
|
|
- Real-time processing and inference
|
|
- Distributed computing frameworks
|
|
- Model deployment and monitoring
|
|
- Security and compliance
|
|
- Cost optimization
|
|
- Team leadership and mentoring
|
|
|
|
## Tech Stack
|
|
|
|
**Languages:** Python, SQL, R, Scala, Go
|
|
**ML Frameworks:** PyTorch, TensorFlow, Scikit-learn, XGBoost
|
|
**Data Tools:** Spark, Airflow, dbt, Kafka, Databricks
|
|
**LLM Frameworks:** LangChain, LlamaIndex, DSPy
|
|
**Deployment:** Docker, Kubernetes, AWS/GCP/Azure
|
|
**Monitoring:** MLflow, Weights & Biases, Prometheus
|
|
**Databases:** PostgreSQL, BigQuery, Snowflake, Pinecone
|
|
|
|
## Reference Documentation
|
|
|
|
### 1. Statistical Methods Advanced
|
|
|
|
Comprehensive guide available in `references/statistical_methods_advanced.md` covering:
|
|
|
|
- Advanced patterns and best practices
|
|
- Production implementation strategies
|
|
- Performance optimization techniques
|
|
- Scalability considerations
|
|
- Security and compliance
|
|
- Real-world case studies
|
|
|
|
### 2. Experiment Design Frameworks
|
|
|
|
Complete workflow documentation in `references/experiment_design_frameworks.md` including:
|
|
|
|
- Step-by-step processes
|
|
- Architecture design patterns
|
|
- Tool integration guides
|
|
- Performance tuning strategies
|
|
- Troubleshooting procedures
|
|
|
|
### 3. Feature Engineering Patterns
|
|
|
|
Technical reference guide in `references/feature_engineering_patterns.md` with:
|
|
|
|
- System design principles
|
|
- Implementation examples
|
|
- Configuration best practices
|
|
- Deployment strategies
|
|
- Monitoring and observability
|
|
|
|
## Production Patterns
|
|
|
|
### Pattern 1: Scalable Data Processing
|
|
|
|
Enterprise-scale data processing with distributed computing:
|
|
|
|
- Horizontal scaling architecture
|
|
- Fault-tolerant design
|
|
- Real-time and batch processing
|
|
- Data quality validation
|
|
- Performance monitoring
|
|
|
|
### Pattern 2: ML Model Deployment
|
|
|
|
Production ML system with high availability:
|
|
|
|
- Model serving with low latency
|
|
- A/B testing infrastructure
|
|
- Feature store integration
|
|
- Model monitoring and drift detection
|
|
- Automated retraining pipelines
|
|
|
|
### Pattern 3: Real-Time Inference
|
|
|
|
High-throughput inference system:
|
|
|
|
- Batching and caching strategies
|
|
- Load balancing
|
|
- Auto-scaling
|
|
- Latency optimization
|
|
- Cost optimization
|
|
|
|
## Best Practices
|
|
|
|
### Development
|
|
|
|
- Test-driven development
|
|
- Code reviews and pair programming
|
|
- Documentation as code
|
|
- Version control everything
|
|
- Continuous integration
|
|
|
|
### Production
|
|
|
|
- Monitor everything critical
|
|
- Automate deployments
|
|
- Feature flags for releases
|
|
- Canary deployments
|
|
- Comprehensive logging
|
|
|
|
### Team Leadership
|
|
|
|
- Mentor junior engineers
|
|
- Drive technical decisions
|
|
- Establish coding standards
|
|
- Foster learning culture
|
|
- Cross-functional collaboration
|
|
|
|
## Performance Targets
|
|
|
|
**Latency:**
|
|
- P50: < 50ms
|
|
- P95: < 100ms
|
|
- P99: < 200ms
|
|
|
|
**Throughput:**
|
|
- Requests/second: > 1000
|
|
- Concurrent users: > 10,000
|
|
|
|
**Availability:**
|
|
- Uptime: 99.9%
|
|
- Error rate: < 0.1%
|
|
|
|
## Security & Compliance
|
|
|
|
- Authentication & authorization
|
|
- Data encryption (at rest & in transit)
|
|
- PII handling and anonymization
|
|
- GDPR/CCPA compliance
|
|
- Regular security audits
|
|
- Vulnerability management
|
|
|
|
## Common Commands
|
|
|
|
```bash
|
|
# Development
|
|
python -m pytest tests/ -v --cov
|
|
python -m black src/
|
|
python -m pylint src/
|
|
|
|
# Training
|
|
python scripts/train.py --config prod.yaml
|
|
python scripts/evaluate.py --model best.pth
|
|
|
|
# Deployment
|
|
docker build -t service:v1 .
|
|
kubectl apply -f k8s/
|
|
helm upgrade service ./charts/
|
|
|
|
# Monitoring
|
|
kubectl logs -f deployment/service
|
|
python scripts/health_check.py
|
|
```
|
|
|
|
## Resources
|
|
|
|
- Advanced Patterns: `references/statistical_methods_advanced.md`
|
|
- Implementation Guide: `references/experiment_design_frameworks.md`
|
|
- Technical Reference: `references/feature_engineering_patterns.md`
|
|
- Automation Scripts: `scripts/` directory
|
|
|
|
## Senior-Level Responsibilities
|
|
|
|
As a world-class senior professional:
|
|
|
|
1. **Technical Leadership**
|
|
- Drive architectural decisions
|
|
- Mentor team members
|
|
- Establish best practices
|
|
- Ensure code quality
|
|
|
|
2. **Strategic Thinking**
|
|
- Align with business goals
|
|
- Evaluate trade-offs
|
|
- Plan for scale
|
|
- Manage technical debt
|
|
|
|
3. **Collaboration**
|
|
- Work across teams
|
|
- Communicate effectively
|
|
- Build consensus
|
|
- Share knowledge
|
|
|
|
4. **Innovation**
|
|
- Stay current with research
|
|
- Experiment with new approaches
|
|
- Contribute to community
|
|
- Drive continuous improvement
|
|
|
|
5. **Production Excellence**
|
|
- Ensure high availability
|
|
- Monitor proactively
|
|
- Optimize performance
|
|
- Respond to incidents
|