feat(senior-architect): Complete skill overhaul per Issue #48 (#88)

Addresses SkillzWave feedback and Anthropic best practices:

SKILL.md (343 lines):
- Third-person description with trigger phrases
- Added Table of Contents for navigation
- Concrete tool descriptions with usage examples
- Decision workflows: Database, Architecture Pattern, Monolith vs Microservices
- Removed marketing fluff, added actionable content

References (rewritten with real content):
- architecture_patterns.md: 9 patterns with trade-offs, code examples
  (Monolith, Modular Monolith, Microservices, Event-Driven, CQRS,
  Event Sourcing, Hexagonal, Clean Architecture, API Gateway)
- system_design_workflows.md: 6 step-by-step workflows
  (System Design Interview, Capacity Planning, API Design,
  Database Schema, Scalability Assessment, Migration Planning)
- tech_decision_guide.md: 7 decision frameworks with matrices
  (Database, Cache, Message Queue, Auth, Frontend, Cloud, API)

Scripts (fully functional, standard library only):
- architecture_diagram_generator.py: Mermaid + PlantUML + ASCII output
  Scans project structure, detects components, relationships
- dependency_analyzer.py: npm/pip/go/cargo support
  Circular dependency detection, coupling score calculation
- project_architect.py: Pattern detection (7 patterns)
  Layer violation detection, code quality metrics

All scripts tested and working.

Closes #48

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alireza Rezvani
2026-01-26 10:29:14 +01:00
committed by GitHub
parent 2738f252b2
commit 94224f2201
7 changed files with 3531 additions and 606 deletions

View File

@@ -1,209 +1,343 @@
---
name: senior-architect
description: Comprehensive software architecture skill for designing scalable, maintainable systems using ReactJS, NextJS, NodeJS, Express, React Native, Swift, Kotlin, Flutter, Postgres, GraphQL, Go, Python. Includes architecture diagram generation, system design patterns, tech stack decision frameworks, and dependency analysis. Use when designing system architecture, making technical decisions, creating architecture diagrams, evaluating trade-offs, or defining integration patterns.
description: This skill should be used when the user asks to "design system architecture", "evaluate microservices vs monolith", "create architecture diagrams", "analyze dependencies", "choose a database", "plan for scalability", "make technical decisions", or "review system design". Use for architecture decision records (ADRs), tech stack evaluation, system design reviews, dependency analysis, and generating architecture diagrams in Mermaid, PlantUML, or ASCII format.
---
# Senior Architect
Complete toolkit for senior architect with modern tools and best practices.
Architecture design and analysis tools for making informed technical decisions.
## Table of Contents
- [Quick Start](#quick-start)
- [Tools Overview](#tools-overview)
- [Architecture Diagram Generator](#1-architecture-diagram-generator)
- [Dependency Analyzer](#2-dependency-analyzer)
- [Project Architect](#3-project-architect)
- [Decision Workflows](#decision-workflows)
- [Database Selection](#database-selection-workflow)
- [Architecture Pattern Selection](#architecture-pattern-selection-workflow)
- [Monolith vs Microservices](#monolith-vs-microservices-decision)
- [Reference Documentation](#reference-documentation)
- [Tech Stack Coverage](#tech-stack-coverage)
- [Common Commands](#common-commands)
---
## Quick Start
### Main Capabilities
This skill provides three core capabilities through automated scripts:
```bash
# Script 1: Architecture Diagram Generator
python scripts/architecture_diagram_generator.py [options]
# Generate architecture diagram from project
python scripts/architecture_diagram_generator.py ./my-project --format mermaid
# Script 2: Project Architect
python scripts/project_architect.py [options]
# Analyze dependencies for issues
python scripts/dependency_analyzer.py ./my-project --output json
# Script 3: Dependency Analyzer
python scripts/dependency_analyzer.py [options]
# Get architecture assessment
python scripts/project_architect.py ./my-project --verbose
```
## Core Capabilities
---
## Tools Overview
### 1. Architecture Diagram Generator
Automated tool for architecture diagram generator tasks.
Generates architecture diagrams from project structure in multiple formats.
**Features:**
- Automated scaffolding
- Best practices built-in
- Configurable templates
- Quality checks
**Solves:** "I need to visualize my system architecture for documentation or team discussion"
**Input:** Project directory path
**Output:** Diagram code (Mermaid, PlantUML, or ASCII)
**Supported diagram types:**
- `component` - Shows modules and their relationships
- `layer` - Shows architectural layers (presentation, business, data)
- `deployment` - Shows deployment topology
**Usage:**
```bash
python scripts/architecture_diagram_generator.py <project-path> [options]
# Mermaid format (default)
python scripts/architecture_diagram_generator.py ./project --format mermaid --type component
# PlantUML format
python scripts/architecture_diagram_generator.py ./project --format plantuml --type layer
# ASCII format (terminal-friendly)
python scripts/architecture_diagram_generator.py ./project --format ascii
# Save to file
python scripts/architecture_diagram_generator.py ./project -o architecture.md
```
### 2. Project Architect
**Example output (Mermaid):**
```mermaid
graph TD
A[API Gateway] --> B[Auth Service]
A --> C[User Service]
B --> D[(PostgreSQL)]
C --> D
```
Comprehensive analysis and optimization tool.
---
**Features:**
- Deep analysis
- Performance metrics
- Recommendations
- Automated fixes
### 2. Dependency Analyzer
Analyzes project dependencies for coupling, circular dependencies, and outdated packages.
**Solves:** "I need to understand my dependency tree and identify potential issues"
**Input:** Project directory path
**Output:** Analysis report (JSON or human-readable)
**Analyzes:**
- Dependency tree (direct and transitive)
- Circular dependencies between modules
- Coupling score (0-100)
- Outdated packages
**Supported package managers:**
- npm/yarn (`package.json`)
- Python (`requirements.txt`, `pyproject.toml`)
- Go (`go.mod`)
- Rust (`Cargo.toml`)
**Usage:**
```bash
python scripts/project_architect.py <target-path> [--verbose]
# Human-readable report
python scripts/dependency_analyzer.py ./project
# JSON output for CI/CD integration
python scripts/dependency_analyzer.py ./project --output json
# Check only for circular dependencies
python scripts/dependency_analyzer.py ./project --check circular
# Verbose mode with recommendations
python scripts/dependency_analyzer.py ./project --verbose
```
### 3. Dependency Analyzer
**Example output:**
```
Dependency Analysis Report
==========================
Total dependencies: 47 (32 direct, 15 transitive)
Coupling score: 72/100 (moderate)
Advanced tooling for specialized tasks.
Issues found:
- CIRCULAR: auth → user → permissions → auth
- OUTDATED: lodash 4.17.15 → 4.17.21 (security)
**Features:**
- Expert-level automation
- Custom configurations
- Integration ready
- Production-grade output
Recommendations:
1. Extract shared interface to break circular dependency
2. Update lodash to fix CVE-2020-8203
```
---
### 3. Project Architect
Analyzes project structure and detects architectural patterns, code smells, and improvement opportunities.
**Solves:** "I want to understand the current architecture and identify areas for improvement"
**Input:** Project directory path
**Output:** Architecture assessment report
**Detects:**
- Architectural patterns (MVC, layered, hexagonal, microservices indicators)
- Code organization issues (god classes, mixed concerns)
- Layer violations
- Missing architectural components
**Usage:**
```bash
python scripts/dependency_analyzer.py [arguments] [options]
# Full assessment
python scripts/project_architect.py ./project
# Verbose with detailed recommendations
python scripts/project_architect.py ./project --verbose
# JSON output
python scripts/project_architect.py ./project --output json
# Check specific aspect
python scripts/project_architect.py ./project --check layers
```
**Example output:**
```
Architecture Assessment
=======================
Detected pattern: Layered Architecture (confidence: 85%)
Structure analysis:
✓ controllers/ - Presentation layer detected
✓ services/ - Business logic layer detected
✓ repositories/ - Data access layer detected
⚠ models/ - Mixed domain and DTOs
Issues:
- LARGE FILE: UserService.ts (1,847 lines) - consider splitting
- MIXED CONCERNS: PaymentController contains business logic
Recommendations:
1. Split UserService into focused services
2. Move business logic from controllers to services
3. Separate domain models from DTOs
```
---
## Decision Workflows
### Database Selection Workflow
Use when choosing a database for a new project or migrating existing data.
**Step 1: Identify data characteristics**
| Characteristic | Points to SQL | Points to NoSQL |
|----------------|---------------|-----------------|
| Structured with relationships | ✓ | |
| ACID transactions required | ✓ | |
| Flexible/evolving schema | | ✓ |
| Document-oriented data | | ✓ |
| Time-series data | | ✓ (specialized) |
**Step 2: Evaluate scale requirements**
- <1M records, single region → PostgreSQL or MySQL
- 1M-100M records, read-heavy → PostgreSQL with read replicas
- >100M records, global distribution → CockroachDB, Spanner, or DynamoDB
- High write throughput (>10K/sec) → Cassandra or ScyllaDB
**Step 3: Check consistency requirements**
- Strong consistency required → SQL or CockroachDB
- Eventual consistency acceptable → DynamoDB, Cassandra, MongoDB
**Step 4: Document decision**
Create an ADR (Architecture Decision Record) with:
- Context and requirements
- Options considered
- Decision and rationale
- Trade-offs accepted
**Quick reference:**
```
PostgreSQL → Default choice for most applications
MongoDB → Document store, flexible schema
Redis → Caching, sessions, real-time features
DynamoDB → Serverless, auto-scaling, AWS-native
TimescaleDB → Time-series data with SQL interface
```
---
### Architecture Pattern Selection Workflow
Use when designing a new system or refactoring existing architecture.
**Step 1: Assess team and project size**
| Team Size | Recommended Starting Point |
|-----------|---------------------------|
| 1-3 developers | Modular monolith |
| 4-10 developers | Modular monolith or service-oriented |
| 10+ developers | Consider microservices |
**Step 2: Evaluate deployment requirements**
- Single deployment unit acceptable → Monolith
- Independent scaling needed → Microservices
- Mixed (some services scale differently) → Hybrid
**Step 3: Consider data boundaries**
- Shared database acceptable → Monolith or modular monolith
- Strict data isolation required → Microservices with separate DBs
- Event-driven communication fits → Event-sourcing/CQRS
**Step 4: Match pattern to requirements**
| Requirement | Recommended Pattern |
|-------------|-------------------|
| Rapid MVP development | Modular Monolith |
| Independent team deployment | Microservices |
| Complex domain logic | Domain-Driven Design |
| High read/write ratio difference | CQRS |
| Audit trail required | Event Sourcing |
| Third-party integrations | Hexagonal/Ports & Adapters |
See `references/architecture_patterns.md` for detailed pattern descriptions.
---
### Monolith vs Microservices Decision
**Choose Monolith when:**
- [ ] Team is small (<10 developers)
- [ ] Domain boundaries are unclear
- [ ] Rapid iteration is priority
- [ ] Operational complexity must be minimized
- [ ] Shared database is acceptable
**Choose Microservices when:**
- [ ] Teams can own services end-to-end
- [ ] Independent deployment is critical
- [ ] Different scaling requirements per component
- [ ] Technology diversity is needed
- [ ] Domain boundaries are well understood
**Hybrid approach:**
Start with a modular monolith. Extract services only when:
1. A module has significantly different scaling needs
2. A team needs independent deployment
3. Technology constraints require separation
---
## Reference Documentation
### Architecture Patterns
Load these files for detailed information:
Comprehensive guide available in `references/architecture_patterns.md`:
| File | Contains | Load when user asks about |
|------|----------|--------------------------|
| `references/architecture_patterns.md` | 9 architecture patterns with trade-offs, code examples, and when to use | "which pattern?", "microservices vs monolith", "event-driven", "CQRS" |
| `references/system_design_workflows.md` | 6 step-by-step workflows for system design tasks | "how to design?", "capacity planning", "API design", "migration" |
| `references/tech_decision_guide.md` | Decision matrices for technology choices | "which database?", "which framework?", "which cloud?", "which cache?" |
- Detailed patterns and practices
- Code examples
- Best practices
- Anti-patterns to avoid
- Real-world scenarios
---
### System Design Workflows
## Tech Stack Coverage
Complete workflow documentation in `references/system_design_workflows.md`:
**Languages:** TypeScript, JavaScript, Python, Go, Swift, Kotlin, Rust
**Frontend:** React, Next.js, Vue, Angular, React Native, Flutter
**Backend:** Node.js, Express, FastAPI, Go, GraphQL, REST
**Databases:** PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, Cassandra
**Infrastructure:** Docker, Kubernetes, Terraform, AWS, GCP, Azure
**CI/CD:** GitHub Actions, GitLab CI, CircleCI, Jenkins
- Step-by-step processes
- Optimization strategies
- Tool integrations
- Performance tuning
- Troubleshooting guide
### Tech Decision Guide
Technical reference guide in `references/tech_decision_guide.md`:
- Technology stack details
- Configuration examples
- Integration patterns
- Security considerations
- Scalability guidelines
## Tech Stack
**Languages:** TypeScript, JavaScript, Python, Go, Swift, Kotlin
**Frontend:** React, Next.js, React Native, Flutter
**Backend:** Node.js, Express, GraphQL, REST APIs
**Database:** PostgreSQL, Prisma, NeonDB, Supabase
**DevOps:** Docker, Kubernetes, Terraform, GitHub Actions, CircleCI
**Cloud:** AWS, GCP, Azure
## Development Workflow
### 1. Setup and Configuration
```bash
# Install dependencies
npm install
# or
pip install -r requirements.txt
# Configure environment
cp .env.example .env
```
### 2. Run Quality Checks
```bash
# Use the analyzer script
python scripts/project_architect.py .
# Review recommendations
# Apply fixes
```
### 3. Implement Best Practices
Follow the patterns and practices documented in:
- `references/architecture_patterns.md`
- `references/system_design_workflows.md`
- `references/tech_decision_guide.md`
## Best Practices Summary
### Code Quality
- Follow established patterns
- Write comprehensive tests
- Document decisions
- Review regularly
### Performance
- Measure before optimizing
- Use appropriate caching
- Optimize critical paths
- Monitor in production
### Security
- Validate all inputs
- Use parameterized queries
- Implement proper authentication
- Keep dependencies updated
### Maintainability
- Write clear code
- Use consistent naming
- Add helpful comments
- Keep it simple
---
## Common Commands
```bash
# Development
npm run dev
npm run build
npm run test
npm run lint
# Architecture visualization
python scripts/architecture_diagram_generator.py . --format mermaid
python scripts/architecture_diagram_generator.py . --format plantuml
python scripts/architecture_diagram_generator.py . --format ascii
# Analysis
python scripts/project_architect.py .
python scripts/dependency_analyzer.py --analyze
# Dependency analysis
python scripts/dependency_analyzer.py . --verbose
python scripts/dependency_analyzer.py . --check circular
python scripts/dependency_analyzer.py . --output json
# Deployment
docker build -t app:latest .
docker-compose up -d
kubectl apply -f k8s/
# Architecture assessment
python scripts/project_architect.py . --verbose
python scripts/project_architect.py . --check layers
python scripts/project_architect.py . --output json
```
## Troubleshooting
---
### Common Issues
## Getting Help
Check the comprehensive troubleshooting section in `references/tech_decision_guide.md`.
### Getting Help
- Review reference documentation
- Check script output messages
- Consult tech stack documentation
- Review error logs
## Resources
- Pattern Reference: `references/architecture_patterns.md`
- Workflow Guide: `references/system_design_workflows.md`
- Technical Guide: `references/tech_decision_guide.md`
- Tool Scripts: `scripts/` directory
1. Run any script with `--help` for usage information
2. Check reference documentation for detailed patterns and workflows
3. Use `--verbose` flag for detailed explanations and recommendations

View File

@@ -1,103 +1,470 @@
# Architecture Patterns
# Architecture Patterns Reference
## Overview
Detailed guide to software architecture patterns with trade-offs and implementation guidance.
This reference guide provides comprehensive information for senior architect.
## Patterns Index
## Patterns and Practices
1. [Monolithic Architecture](#1-monolithic-architecture)
2. [Modular Monolith](#2-modular-monolith)
3. [Microservices Architecture](#3-microservices-architecture)
4. [Event-Driven Architecture](#4-event-driven-architecture)
5. [CQRS (Command Query Responsibility Segregation)](#5-cqrs)
6. [Event Sourcing](#6-event-sourcing)
7. [Hexagonal Architecture (Ports & Adapters)](#7-hexagonal-architecture)
8. [Clean Architecture](#8-clean-architecture)
9. [API Gateway Pattern](#9-api-gateway-pattern)
### Pattern 1: Best Practice Implementation
---
**Description:**
Detailed explanation of the pattern.
## 1. Monolithic Architecture
**When to Use:**
- Scenario 1
- Scenario 2
- Scenario 3
**Problem it solves:** Need to build and deploy a complete application as a single unit with minimal operational complexity.
**Implementation:**
```typescript
// Example code implementation
export class Example {
// Implementation details
}
```
**When to use:**
- Small team (1-5 developers)
- MVP or early-stage product
- Simple domain with clear boundaries
- Deployment simplicity is priority
**Benefits:**
- Benefit 1
- Benefit 2
- Benefit 3
**When NOT to use:**
- Multiple teams need independent deployment
- Parts of system have vastly different scaling needs
- Technology diversity is required
**Trade-offs:**
- Consider 1
- Consider 2
- Consider 3
| Pros | Cons |
|------|------|
| Simple deployment | Scaling is all-or-nothing |
| Easy debugging | Large codebase becomes unwieldy |
| No network latency between components | Single point of failure |
| Simple testing | Technology lock-in |
### Pattern 2: Advanced Technique
**Structure example:**
```
monolith/
├── src/
│ ├── controllers/ # HTTP handlers
│ ├── services/ # Business logic
│ ├── repositories/ # Data access
│ ├── models/ # Domain entities
│ └── utils/ # Shared utilities
├── tests/
└── package.json
```
**Description:**
Another important pattern for senior architect.
---
**Implementation:**
## 2. Modular Monolith
**Problem it solves:** Need monolith simplicity but with clear boundaries that enable future extraction to services.
**When to use:**
- Medium team (5-15 developers)
- Domain boundaries are becoming clearer
- Want option to extract services later
- Need better code organization than traditional monolith
**When NOT to use:**
- Already need independent deployment
- Teams can't coordinate releases
**Trade-offs:**
| Pros | Cons |
|------|------|
| Clear module boundaries | Still single deployment |
| Easier to extract services later | Requires discipline to maintain boundaries |
| Single database simplifies transactions | Can drift back to coupled monolith |
| Team ownership of modules | |
**Structure example:**
```
modular-monolith/
├── modules/
│ ├── users/
│ │ ├── api/ # Public interface
│ │ ├── internal/ # Implementation
│ │ └── index.ts # Module exports
│ ├── orders/
│ │ ├── api/
│ │ ├── internal/
│ │ └── index.ts
│ └── payments/
├── shared/ # Cross-cutting concerns
└── main.ts
```
**Key rule:** Modules communicate only through their public API, never by importing internal files.
---
## 3. Microservices Architecture
**Problem it solves:** Need independent deployment, scaling, and technology choices for different parts of the system.
**When to use:**
- Large team (15+ developers) organized around business capabilities
- Different parts need different scaling
- Independent deployment is critical
- Technology diversity is beneficial
**When NOT to use:**
- Small team that can't handle operational complexity
- Domain boundaries are unclear
- Distributed transactions are common requirement
- Network latency is unacceptable
**Trade-offs:**
| Pros | Cons |
|------|------|
| Independent deployment | Network complexity |
| Independent scaling | Distributed system challenges |
| Technology flexibility | Operational overhead |
| Team autonomy | Data consistency challenges |
| Fault isolation | Testing complexity |
**Structure example:**
```
microservices/
├── services/
│ ├── user-service/
│ │ ├── src/
│ │ ├── Dockerfile
│ │ └── package.json
│ ├── order-service/
│ └── payment-service/
├── api-gateway/
├── infrastructure/
│ ├── kubernetes/
│ └── terraform/
└── docker-compose.yml
```
**Communication patterns:**
- Synchronous: REST, gRPC
- Asynchronous: Message queues (RabbitMQ, Kafka)
---
## 4. Event-Driven Architecture
**Problem it solves:** Need loose coupling between components that react to business events asynchronously.
**When to use:**
- Components need loose coupling
- Audit trail of all changes is valuable
- Real-time reactions to events
- Multiple consumers for same events
**When NOT to use:**
- Simple CRUD operations
- Synchronous responses required
- Team unfamiliar with async patterns
- Debugging simplicity is priority
**Trade-offs:**
| Pros | Cons |
|------|------|
| Loose coupling | Eventual consistency |
| Scalability | Debugging complexity |
| Audit trail built-in | Message ordering challenges |
| Easy to add new consumers | Infrastructure complexity |
**Event structure example:**
```typescript
// Advanced example
async function advancedExample() {
// Code here
interface DomainEvent {
eventId: string;
eventType: string;
aggregateId: string;
timestamp: Date;
payload: Record<string, unknown>;
metadata: {
correlationId: string;
causationId: string;
};
}
// Example event
const orderCreated: DomainEvent = {
eventId: "evt-123",
eventType: "OrderCreated",
aggregateId: "order-456",
timestamp: new Date(),
payload: {
customerId: "cust-789",
items: [...],
total: 99.99
},
metadata: {
correlationId: "req-001",
causationId: "cmd-create-order"
}
};
```
---
## 5. CQRS
**Problem it solves:** Read and write workloads have different requirements and need to be optimized separately.
**When to use:**
- Read/write ratio is heavily skewed (10:1 or more)
- Read and write models differ significantly
- Complex queries that don't map to write model
- Different scaling needs for reads vs writes
**When NOT to use:**
- Simple CRUD with balanced reads/writes
- Read and write models are nearly identical
- Team unfamiliar with pattern
- Added complexity isn't justified
**Trade-offs:**
| Pros | Cons |
|------|------|
| Optimized read models | Eventual consistency between models |
| Independent scaling | Complexity |
| Simplified queries | Synchronization logic |
| Better performance | More code to maintain |
**Structure example:**
```typescript
// Write side (Commands)
interface CreateOrderCommand {
customerId: string;
items: OrderItem[];
}
class OrderCommandHandler {
async handle(cmd: CreateOrderCommand): Promise<void> {
const order = Order.create(cmd);
await this.repository.save(order);
await this.eventBus.publish(order.events);
}
}
// Read side (Queries)
interface OrderSummaryQuery {
customerId: string;
dateRange: DateRange;
}
class OrderQueryHandler {
async handle(query: OrderSummaryQuery): Promise<OrderSummary[]> {
// Query optimized read model (denormalized)
return this.readDb.query(`
SELECT * FROM order_summaries
WHERE customer_id = ? AND created_at BETWEEN ? AND ?
`, [query.customerId, query.dateRange.start, query.dateRange.end]);
}
}
```
## Guidelines
---
### Code Organization
- Clear structure
- Logical separation
- Consistent naming
- Proper documentation
## 6. Event Sourcing
### Performance Considerations
- Optimization strategies
- Bottleneck identification
- Monitoring approaches
- Scaling techniques
**Problem it solves:** Need complete audit trail and ability to reconstruct state at any point in time.
### Security Best Practices
- Input validation
- Authentication
- Authorization
- Data protection
**When to use:**
- Audit trail is regulatory requirement
- Need to answer "how did we get here?"
- Complex domain with undo/redo requirements
- Debugging production issues requires history
## Common Patterns
**When NOT to use:**
- Simple CRUD applications
- No audit requirements
- Team unfamiliar with pattern
- Reporting on current state is primary need
### Pattern A
Implementation details and examples.
**Trade-offs:**
| Pros | Cons |
|------|------|
| Complete audit trail | Storage grows indefinitely |
| Time-travel debugging | Query complexity |
| Natural fit for event-driven | Learning curve |
| Enables CQRS | Eventual consistency |
### Pattern B
Implementation details and examples.
**Implementation example:**
```typescript
// Events
type OrderEvent =
| { type: 'OrderCreated'; customerId: string; items: Item[] }
| { type: 'ItemAdded'; itemId: string; quantity: number }
| { type: 'OrderShipped'; trackingNumber: string };
### Pattern C
Implementation details and examples.
// Aggregate rebuilt from events
class Order {
private state: OrderState;
## Anti-Patterns to Avoid
static fromEvents(events: OrderEvent[]): Order {
const order = new Order();
events.forEach(event => order.apply(event));
return order;
}
### Anti-Pattern 1
What not to do and why.
private apply(event: OrderEvent): void {
switch (event.type) {
case 'OrderCreated':
this.state = { status: 'created', items: event.items };
break;
case 'ItemAdded':
this.state.items.push({ id: event.itemId, qty: event.quantity });
break;
case 'OrderShipped':
this.state.status = 'shipped';
this.state.trackingNumber = event.trackingNumber;
break;
}
}
}
```
### Anti-Pattern 2
What not to do and why.
---
## Tools and Resources
## 7. Hexagonal Architecture
### Recommended Tools
- Tool 1: Purpose
- Tool 2: Purpose
- Tool 3: Purpose
**Problem it solves:** Need to isolate business logic from external concerns (databases, APIs, UI) for testability and flexibility.
### Further Reading
- Resource 1
- Resource 2
- Resource 3
**When to use:**
- Business logic is complex and valuable
- Multiple interfaces to same domain (API, CLI, events)
- Testability is priority
- External systems may change
## Conclusion
**When NOT to use:**
- Simple CRUD with no business logic
- Single interface to domain
- Overhead isn't justified
Key takeaways for using this reference guide effectively.
**Trade-offs:**
| Pros | Cons |
|------|------|
| Business logic isolation | More abstractions |
| Highly testable | Initial setup overhead |
| External systems are swappable | Can be over-engineered |
| Clear boundaries | Learning curve |
**Structure example:**
```
hexagonal/
├── domain/ # Business logic (no external deps)
│ ├── entities/
│ ├── services/
│ └── ports/ # Interfaces (what domain needs)
│ ├── OrderRepository.ts
│ └── PaymentGateway.ts
├── adapters/ # Implementations
│ ├── persistence/ # Database adapters
│ │ └── PostgresOrderRepository.ts
│ ├── payment/ # External service adapters
│ │ └── StripePaymentGateway.ts
│ └── api/ # HTTP adapters
│ └── OrderController.ts
└── config/ # Wiring it all together
```
---
## 8. Clean Architecture
**Problem it solves:** Need clear dependency rules where business logic doesn't depend on frameworks or external systems.
**When to use:**
- Long-lived applications that will outlive frameworks
- Business logic is the core value
- Team discipline to maintain boundaries
- Multiple delivery mechanisms (web, mobile, CLI)
**When NOT to use:**
- Short-lived projects
- Framework-centric applications
- Simple CRUD operations
**Trade-offs:**
| Pros | Cons |
|------|------|
| Framework independence | More code |
| Testable business logic | Can feel over-engineered |
| Clear dependency direction | Learning curve |
| Flexible delivery mechanisms | Initial setup cost |
**Dependency rule:** Dependencies point inward. Inner circles know nothing about outer circles.
```
┌─────────────────────────────────────────┐
│ Frameworks & Drivers │
│ ┌─────────────────────────────────┐ │
│ │ Interface Adapters │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ Application Layer │ │ │
│ │ │ ┌─────────────────┐ │ │ │
│ │ │ │ Entities │ │ │ │
│ │ │ │ (Domain Logic) │ │ │ │
│ │ │ └─────────────────┘ │ │ │
│ │ └─────────────────────────┘ │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
```
---
## 9. API Gateway Pattern
**Problem it solves:** Need single entry point for clients that routes to multiple backend services.
**When to use:**
- Multiple backend services
- Cross-cutting concerns (auth, rate limiting, logging)
- Different clients need different APIs
- Service aggregation needed
**When NOT to use:**
- Single backend service
- Simplicity is priority
- Team can't maintain gateway
**Trade-offs:**
| Pros | Cons |
|------|------|
| Single entry point | Single point of failure |
| Cross-cutting concerns centralized | Additional latency |
| Backend service abstraction | Complexity |
| Client-specific APIs | Can become bottleneck |
**Responsibilities:**
```
┌─────────────────────────────────────┐
│ API Gateway │
├─────────────────────────────────────┤
│ • Authentication/Authorization │
│ • Rate limiting │
│ • Request/Response transformation │
│ • Load balancing │
│ • Circuit breaking │
│ • Caching │
│ • Logging/Monitoring │
└─────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│Svc A│ │Svc B│ │Svc C│
└─────┘ └─────┘ └─────┘
```
---
## Pattern Selection Quick Reference
| If you need... | Consider... |
|----------------|-------------|
| Simplicity, small team | Monolith |
| Clear boundaries, future flexibility | Modular Monolith |
| Independent deployment/scaling | Microservices |
| Loose coupling, async processing | Event-Driven |
| Separate read/write optimization | CQRS |
| Complete audit trail | Event Sourcing |
| Testable, swappable externals | Hexagonal |
| Framework independence | Clean Architecture |
| Single entry point, multiple services | API Gateway |

View File

@@ -1,103 +1,536 @@
# System Design Workflows
## Overview
Step-by-step workflows for common system design tasks.
This reference guide provides comprehensive information for senior architect.
## Workflows Index
## Patterns and Practices
1. [System Design Interview Approach](#1-system-design-interview-approach)
2. [Capacity Planning Workflow](#2-capacity-planning-workflow)
3. [API Design Workflow](#3-api-design-workflow)
4. [Database Schema Design](#4-database-schema-design-workflow)
5. [Scalability Assessment](#5-scalability-assessment-workflow)
6. [Migration Planning](#6-migration-planning-workflow)
### Pattern 1: Best Practice Implementation
---
**Description:**
Detailed explanation of the pattern.
## 1. System Design Interview Approach
**When to Use:**
- Scenario 1
- Scenario 2
- Scenario 3
Use when designing a system from scratch or explaining architecture decisions.
**Implementation:**
```typescript
// Example code implementation
export class Example {
// Implementation details
### Step 1: Clarify Requirements (3-5 minutes)
**Functional requirements:**
- What are the core features?
- Who are the users?
- What actions can users take?
**Non-functional requirements:**
- Expected scale (users, requests/sec, data size)
- Latency requirements
- Availability requirements (99.9%? 99.99%?)
- Consistency requirements (strong? eventual?)
**Example questions to ask:**
```
- How many users? Daily active users?
- Read/write ratio?
- Data retention period?
- Geographic distribution?
- Peak vs average load?
```
### Step 2: Estimate Scale (2-3 minutes)
**Calculate key metrics:**
```
Users: 10M monthly active users
DAU: 1M daily active users
Requests: 100 req/user/day = 100M req/day
= 1,200 req/sec (avg)
= 3,600 req/sec (peak, 3x)
Storage: 1KB/request × 100M = 100GB/day
= 36TB/year
Bandwidth: 100GB/day = 1.2 MB/sec (avg)
```
### Step 3: Design High-Level Architecture (5-10 minutes)
**Start with basic components:**
```
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │────▶│ API │────▶│ Database │
└──────────┘ └──────────┘ └──────────┘
```
**Add components as needed:**
- Load balancer for traffic distribution
- Cache for read-heavy workloads
- CDN for static content
- Message queue for async processing
- Search index for complex queries
### Step 4: Deep Dive into Components (10-15 minutes)
**For each major component, discuss:**
- Why this technology choice?
- How does it handle failures?
- How does it scale?
- What are the trade-offs?
### Step 5: Address Bottlenecks (5 minutes)
**Common bottlenecks:**
- Database read/write capacity
- Network bandwidth
- Single points of failure
- Hot spots in data distribution
**Solutions:**
- Caching (Redis, Memcached)
- Database sharding
- Read replicas
- CDN for static content
- Async processing for non-critical paths
---
## 2. Capacity Planning Workflow
Use when estimating infrastructure requirements for a new system or feature.
### Step 1: Gather Requirements
| Metric | Current | 6 months | 1 year |
|--------|---------|----------|--------|
| Monthly active users | | | |
| Peak concurrent users | | | |
| Requests per second | | | |
| Data storage (GB) | | | |
| Bandwidth (Mbps) | | | |
### Step 2: Calculate Compute Requirements
**Web/API servers:**
```
Peak RPS: 3,600
Requests per server: 500 (conservative)
Servers needed: 3,600 / 500 = 8 servers
With redundancy (N+2): 10 servers
```
**CPU estimation:**
```
Per request: 50ms CPU time
Peak RPS: 3,600
CPU cores: 3,600 × 0.05 = 180 cores
With headroom (70% target utilization):
180 / 0.7 = 257 cores
= 32 servers × 8 cores
```
### Step 3: Calculate Storage Requirements
**Database storage:**
```
Records per day: 100,000
Record size: 2KB
Daily growth: 200MB
With indexes (2x): 400MB/day
Retention (1 year): 146GB
With replication (3x): 438GB
```
**File storage:**
```
Files per day: 10,000
Average file size: 500KB
Daily growth: 5GB
Retention (1 year): 1.8TB
```
### Step 4: Calculate Network Requirements
**Bandwidth:**
```
Response size: 10KB average
Peak RPS: 3,600
Outbound: 3,600 × 10KB = 36MB/s = 288 Mbps
With headroom (50%): 432 Mbps ≈ 500 Mbps connection
```
### Step 5: Document and Review
**Create capacity plan document:**
- Current requirements
- Growth projections
- Infrastructure recommendations
- Cost estimates
- Review triggers (when to re-evaluate)
---
## 3. API Design Workflow
Use when designing new APIs or refactoring existing ones.
### Step 1: Identify Resources
**List the nouns in your domain:**
```
E-commerce example:
- Users
- Products
- Orders
- Payments
- Reviews
```
### Step 2: Define Operations
**Map CRUD to HTTP methods:**
| Operation | HTTP Method | URL Pattern |
|-----------|-------------|-------------|
| List | GET | /resources |
| Get one | GET | /resources/{id} |
| Create | POST | /resources |
| Update | PUT/PATCH | /resources/{id} |
| Delete | DELETE | /resources/{id} |
### Step 3: Design Request/Response Formats
**Request example:**
```json
POST /api/v1/orders
Content-Type: application/json
{
"customer_id": "cust-123",
"items": [
{"product_id": "prod-456", "quantity": 2}
],
"shipping_address": {
"street": "123 Main St",
"city": "San Francisco",
"state": "CA",
"zip": "94102"
}
}
```
**Benefits:**
- Benefit 1
- Benefit 2
- Benefit 3
**Response example:**
```json
HTTP/1.1 201 Created
Content-Type: application/json
**Trade-offs:**
- Consider 1
- Consider 2
- Consider 3
### Pattern 2: Advanced Technique
**Description:**
Another important pattern for senior architect.
**Implementation:**
```typescript
// Advanced example
async function advancedExample() {
// Code here
{
"id": "ord-789",
"status": "pending",
"customer_id": "cust-123",
"items": [...],
"total": 99.99,
"created_at": "2024-01-15T10:30:00Z",
"_links": {
"self": "/api/v1/orders/ord-789",
"customer": "/api/v1/customers/cust-123"
}
}
```
## Guidelines
### Step 4: Handle Errors Consistently
### Code Organization
- Clear structure
- Logical separation
- Consistent naming
- Proper documentation
**Error response format:**
```json
HTTP/1.1 400 Bad Request
Content-Type: application/json
### Performance Considerations
- Optimization strategies
- Bottleneck identification
- Monitoring approaches
- Scaling techniques
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid request parameters",
"details": [
{
"field": "quantity",
"message": "must be greater than 0"
}
]
},
"request_id": "req-abc123"
}
```
### Security Best Practices
- Input validation
- Authentication
- Authorization
- Data protection
**Standard error codes:**
| HTTP Status | Use Case |
|-------------|----------|
| 400 | Validation errors |
| 401 | Authentication required |
| 403 | Permission denied |
| 404 | Resource not found |
| 409 | Conflict (duplicate, etc.) |
| 429 | Rate limit exceeded |
| 500 | Internal server error |
## Common Patterns
### Step 5: Document the API
### Pattern A
Implementation details and examples.
**Include:**
- Authentication method
- Base URL and versioning
- Endpoints with examples
- Error codes and meanings
- Rate limits
- Pagination format
### Pattern B
Implementation details and examples.
---
### Pattern C
Implementation details and examples.
## 4. Database Schema Design Workflow
## Anti-Patterns to Avoid
Use when designing a new database or major schema changes.
### Anti-Pattern 1
What not to do and why.
### Step 1: Identify Entities
### Anti-Pattern 2
What not to do and why.
**List the things you need to store:**
```
E-commerce:
- User (id, email, name, created_at)
- Product (id, name, price, stock)
- Order (id, user_id, status, total)
- OrderItem (id, order_id, product_id, quantity, price)
```
## Tools and Resources
### Step 2: Define Relationships
### Recommended Tools
- Tool 1: Purpose
- Tool 2: Purpose
- Tool 3: Purpose
**Relationship types:**
```
User ──1:N──▶ Order (one user, many orders)
Order ──1:N──▶ OrderItem (one order, many items)
Product ──1:N──▶ OrderItem (one product, many order items)
```
### Further Reading
- Resource 1
- Resource 2
- Resource 3
### Step 3: Choose Primary Keys
## Conclusion
**Options:**
| Type | Pros | Cons |
|------|------|------|
| Auto-increment | Simple, ordered | Not distributed-friendly |
| UUID | Globally unique | Larger, random |
| ULID | Globally unique, sortable | Larger |
Key takeaways for using this reference guide effectively.
### Step 4: Add Indexes
**Index selection rules:**
```sql
-- Index columns used in WHERE clauses
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Index columns used in JOINs
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- Index columns used in ORDER BY with WHERE
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Consider composite indexes for common queries
-- Query: SELECT * FROM orders WHERE user_id = ? AND status = 'active'
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
```
### Step 5: Plan for Scale
**Partitioning strategies:**
```sql
-- Partition by date (time-series data)
CREATE TABLE events (
id BIGINT,
created_at TIMESTAMP,
data JSONB
) PARTITION BY RANGE (created_at);
-- Partition by hash (distribute evenly)
CREATE TABLE users (
id BIGINT,
email VARCHAR(255)
) PARTITION BY HASH (id);
```
**Sharding considerations:**
- Shard key selection (user_id, tenant_id, etc.)
- Cross-shard query limitations
- Rebalancing strategy
---
## 5. Scalability Assessment Workflow
Use when evaluating if current architecture can handle growth.
### Step 1: Profile Current System
**Metrics to collect:**
```
Current load:
- Average requests/sec: ___
- Peak requests/sec: ___
- Average latency: ___ ms
- P99 latency: ___ ms
- Error rate: ___%
Resource utilization:
- CPU: ___%
- Memory: ___%
- Disk I/O: ___%
- Network: ___%
```
### Step 2: Identify Bottlenecks
**Check each layer:**
| Layer | Bottleneck Signs |
|-------|------------------|
| Web servers | High CPU, connection limits |
| Application | Slow requests, thread pool exhaustion |
| Database | Slow queries, lock contention |
| Cache | High miss rate, memory pressure |
| Network | Bandwidth saturation, latency |
### Step 3: Load Test
**Test scenarios:**
```
1. Baseline: Current production load
2. 2x load: Expected growth in 6 months
3. 5x load: Stress test
4. Spike: Sudden 10x for 5 minutes
```
**Tools:**
- k6, Locust, JMeter for HTTP
- pgbench for PostgreSQL
- redis-benchmark for Redis
### Step 4: Identify Scaling Strategy
**Vertical scaling (scale up):**
- Add more CPU, memory, disk
- Simpler but has limits
- Use when: Single server can handle more
**Horizontal scaling (scale out):**
- Add more servers
- Requires stateless design
- Use when: Need linear scaling
### Step 5: Create Scaling Plan
**Document:**
```
Trigger: When average CPU > 70% for 15 minutes
Action:
1. Add 2 more web servers
2. Update load balancer
3. Verify health checks pass
Rollback:
1. Remove added servers
2. Update load balancer
3. Investigate issue
```
---
## 6. Migration Planning Workflow
Use when migrating to new infrastructure, database, or architecture.
### Step 1: Assess Current State
**Document:**
- Current architecture diagram
- Data volumes
- Dependencies
- Integration points
- Performance baselines
### Step 2: Define Target State
**Document:**
- New architecture diagram
- Technology changes
- Expected improvements
- Success criteria
### Step 3: Plan Migration Strategy
**Strategies:**
| Strategy | Risk | Downtime | Complexity |
|----------|------|----------|------------|
| Big bang | High | Yes | Low |
| Blue-green | Medium | Minimal | Medium |
| Canary | Low | None | High |
| Strangler fig | Low | None | High |
**Strangler fig pattern (recommended for large systems):**
```
1. Add facade in front of old system
2. Route small percentage of traffic to new system
3. Gradually increase traffic to new system
4. Retire old system when 100% migrated
```
### Step 4: Create Rollback Plan
**For each step, define:**
```
Step: Migrate user service to new database
Rollback trigger:
- Error rate > 1%
- Latency > 500ms P99
- Data inconsistency detected
Rollback steps:
1. Route traffic back to old database
2. Sync any new data back
3. Investigate root cause
Rollback time estimate: 15 minutes
```
### Step 5: Execute with Checkpoints
**Migration checklist:**
```
□ Backup current system
□ Verify backup restoration works
□ Deploy new infrastructure
□ Run smoke tests on new system
□ Migrate small percentage (1%)
□ Monitor for 24 hours
□ Increase to 10%
□ Monitor for 24 hours
□ Increase to 50%
□ Monitor for 24 hours
□ Complete migration (100%)
□ Decommission old system
□ Document lessons learned
```
---
## Quick Reference
| Task | Start Here |
|------|------------|
| New system design | [System Design Interview Approach](#1-system-design-interview-approach) |
| Infrastructure sizing | [Capacity Planning](#2-capacity-planning-workflow) |
| New API | [API Design](#3-api-design-workflow) |
| Database design | [Database Schema Design](#4-database-schema-design-workflow) |
| Handle growth | [Scalability Assessment](#5-scalability-assessment-workflow) |
| System migration | [Migration Planning](#6-migration-planning-workflow) |

View File

@@ -1,103 +1,412 @@
# Tech Decision Guide
# Technology Decision Guide
## Overview
Decision frameworks and comparison matrices for common technology choices.
This reference guide provides comprehensive information for senior architect.
## Decision Frameworks Index
## Patterns and Practices
1. [Database Selection](#1-database-selection)
2. [Caching Strategy](#2-caching-strategy)
3. [Message Queue Selection](#3-message-queue-selection)
4. [Authentication Strategy](#4-authentication-strategy)
5. [Frontend Framework Selection](#5-frontend-framework-selection)
6. [Cloud Provider Selection](#6-cloud-provider-selection)
7. [API Style Selection](#7-api-style-selection)
### Pattern 1: Best Practice Implementation
---
**Description:**
Detailed explanation of the pattern.
## 1. Database Selection
**When to Use:**
- Scenario 1
- Scenario 2
- Scenario 3
### SQL vs NoSQL Decision Matrix
**Implementation:**
```typescript
// Example code implementation
export class Example {
// Implementation details
}
| Factor | Choose SQL | Choose NoSQL |
|--------|-----------|--------------|
| Data relationships | Complex, many-to-many | Simple, denormalized OK |
| Schema | Well-defined, stable | Evolving, flexible |
| Transactions | ACID required | Eventual consistency OK |
| Query patterns | Complex joins, aggregations | Key-value, document lookups |
| Scale | Vertical (some horizontal) | Horizontal first |
| Team expertise | Strong SQL skills | Document/KV experience |
### Database Type Selection
**Relational (SQL):**
| Database | Best For | Avoid When |
|----------|----------|------------|
| PostgreSQL | General purpose, JSON support, extensions | Simple key-value only |
| MySQL | Web applications, read-heavy | Complex queries, JSON-heavy |
| SQLite | Embedded, development, small apps | Concurrent writes, scale |
**Document (NoSQL):**
| Database | Best For | Avoid When |
|----------|----------|------------|
| MongoDB | Flexible schema, rapid iteration | Complex transactions |
| CouchDB | Offline-first, sync required | High throughput |
**Key-Value:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Redis | Caching, sessions, real-time | Persistence critical |
| DynamoDB | Serverless, auto-scaling | Complex queries |
**Wide-Column:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Cassandra | Write-heavy, time-series | Complex queries, small scale |
| ScyllaDB | Cassandra alternative, performance | Small datasets |
**Time-Series:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| TimescaleDB | Time-series with SQL | Non-time-series data |
| InfluxDB | Metrics, monitoring | Relational queries |
**Search:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Elasticsearch | Full-text search, logs | Primary data store |
| Meilisearch | Simple search, fast setup | Complex analytics |
### Quick Decision Flow
```
Start
├─ Need ACID transactions? ──Yes──► PostgreSQL/MySQL
├─ Flexible schema needed? ──Yes──► MongoDB
├─ Write-heavy (>50K/sec)? ──Yes──► Cassandra/ScyllaDB
├─ Key-value access only? ──Yes──► Redis/DynamoDB
├─ Time-series data? ──Yes──► TimescaleDB/InfluxDB
├─ Full-text search? ──Yes──► Elasticsearch
└─ Default ──────────────────────► PostgreSQL
```
**Benefits:**
- Benefit 1
- Benefit 2
- Benefit 3
---
**Trade-offs:**
- Consider 1
- Consider 2
- Consider 3
## 2. Caching Strategy
### Pattern 2: Advanced Technique
### Cache Type Selection
**Description:**
Another important pattern for senior architect.
| Type | Use Case | Invalidation | Complexity |
|------|----------|--------------|------------|
| Read-through | Frequent reads, tolerance for stale | On write/TTL | Low |
| Write-through | Data consistency critical | Automatic | Medium |
| Write-behind | High write throughput | Async | High |
| Cache-aside | Fine-grained control | Application | Medium |
**Implementation:**
```typescript
// Advanced example
async function advancedExample() {
// Code here
}
### Cache Technology Selection
| Technology | Best For | Limitations |
|------------|----------|-------------|
| Redis | General purpose, data structures | Memory cost |
| Memcached | Simple key-value, high throughput | No persistence |
| CDN (CloudFront, Fastly) | Static assets, edge caching | Dynamic content |
| Application cache | Per-instance, small data | Not distributed |
### Cache Patterns
**Cache-Aside (Lazy Loading):**
```
Read:
1. Check cache
2. If miss, read from DB
3. Store in cache
4. Return data
Write:
1. Write to DB
2. Invalidate cache
```
## Guidelines
**Write-Through:**
```
Write:
1. Write to cache
2. Cache writes to DB
3. Return success
### Code Organization
- Clear structure
- Logical separation
- Consistent naming
- Proper documentation
Read:
1. Read from cache (always hit)
```
### Performance Considerations
- Optimization strategies
- Bottleneck identification
- Monitoring approaches
- Scaling techniques
**TTL Guidelines:**
### Security Best Practices
- Input validation
- Authentication
- Authorization
- Data protection
| Data Type | Suggested TTL |
|-----------|---------------|
| User sessions | 24-48 hours |
| API responses | 1-5 minutes |
| Static content | 24 hours - 1 week |
| Database queries | 5-60 minutes |
| Feature flags | 1-5 minutes |
## Common Patterns
---
### Pattern A
Implementation details and examples.
## 3. Message Queue Selection
### Pattern B
Implementation details and examples.
### Queue Technology Comparison
### Pattern C
Implementation details and examples.
| Feature | RabbitMQ | Kafka | SQS | Redis Streams |
|---------|----------|-------|-----|---------------|
| Throughput | Medium (10K/s) | Very High (100K+/s) | Medium | High |
| Ordering | Per-queue | Per-partition | FIFO optional | Per-stream |
| Durability | Configurable | Strong | Strong | Configurable |
| Replay | No | Yes | No | Yes |
| Complexity | Medium | High | Low | Low |
| Cost | Self-hosted | Self-hosted | Pay-per-use | Self-hosted |
## Anti-Patterns to Avoid
### Decision Matrix
### Anti-Pattern 1
What not to do and why.
| Requirement | Recommendation |
|-------------|----------------|
| Simple task queue | SQS or Redis |
| Event streaming | Kafka |
| Complex routing | RabbitMQ |
| Log aggregation | Kafka |
| Serverless integration | SQS |
| Real-time analytics | Kafka |
| Request/reply pattern | RabbitMQ |
### Anti-Pattern 2
What not to do and why.
### When to Use Each
## Tools and Resources
**RabbitMQ:**
- Complex routing logic (topic, fanout, headers)
- Request/reply patterns
- Priority queues
- Message acknowledgment critical
### Recommended Tools
- Tool 1: Purpose
- Tool 2: Purpose
- Tool 3: Purpose
**Kafka:**
- Event sourcing
- High throughput requirements (>50K messages/sec)
- Message replay needed
- Stream processing
- Log aggregation
### Further Reading
- Resource 1
- Resource 2
- Resource 3
**SQS:**
- AWS-native applications
- Simple queue semantics
- Serverless architectures
- Don't want to manage infrastructure
## Conclusion
**Redis Streams:**
- Already using Redis
- Moderate throughput
- Simple streaming needs
- Real-time features
Key takeaways for using this reference guide effectively.
---
## 4. Authentication Strategy
### Method Selection
| Method | Best For | Avoid When |
|--------|----------|------------|
| Session-based | Traditional web apps, server-rendered | Mobile apps, microservices |
| JWT | SPAs, mobile apps, microservices | Need immediate revocation |
| OAuth 2.0 | Third-party access, social login | Internal-only apps |
| API Keys | Server-to-server, simple auth | User authentication |
| mTLS | Service mesh, high security | Public APIs |
### JWT vs Sessions
| Factor | JWT | Sessions |
|--------|-----|----------|
| Scalability | Stateless, easy to scale | Requires session store |
| Revocation | Difficult (need blocklist) | Immediate |
| Payload | Can contain claims | Server-side only |
| Security | Token in client | Server-controlled |
| Mobile friendly | Yes | Requires cookies |
### OAuth 2.0 Flow Selection
| Flow | Use Case |
|------|----------|
| Authorization Code | Web apps with backend |
| Authorization Code + PKCE | SPAs, mobile apps |
| Client Credentials | Machine-to-machine |
| Device Code | Smart TVs, CLI tools |
**Avoid:** Implicit flow (deprecated), Resource Owner Password (legacy only)
### Token Lifetimes
| Token Type | Suggested Lifetime |
|------------|-------------------|
| Access token | 15-60 minutes |
| Refresh token | 7-30 days |
| API key | No expiry (rotate quarterly) |
| Session | 24 hours - 7 days |
---
## 5. Frontend Framework Selection
### Framework Comparison
| Factor | React | Vue | Angular | Svelte |
|--------|-------|-----|---------|--------|
| Learning curve | Medium | Low | High | Low |
| Ecosystem | Largest | Large | Complete | Growing |
| Performance | Good | Good | Good | Excellent |
| Bundle size | Medium | Small | Large | Smallest |
| TypeScript | Good | Good | Native | Good |
| Job market | Largest | Growing | Enterprise | Niche |
### Decision Matrix
| Requirement | Recommendation |
|-------------|----------------|
| Large team, enterprise | Angular |
| Startup, rapid iteration | React or Vue |
| Performance critical | Svelte or Solid |
| Existing React team | React |
| Progressive enhancement | Vue or Svelte |
| Component library needed | React (most options) |
### Meta-Framework Selection
| Framework | Best For |
|-----------|----------|
| Next.js (React) | Full-stack React, SSR/SSG |
| Nuxt (Vue) | Full-stack Vue, SSR/SSG |
| SvelteKit | Full-stack Svelte |
| Remix | Data-heavy React apps |
| Astro | Content sites, multi-framework |
### When to Use SSR vs SPA vs SSG
| Rendering | Use When |
|-----------|----------|
| SSR | SEO critical, dynamic content, auth-gated |
| SPA | Internal tools, highly interactive, no SEO |
| SSG | Content sites, blogs, documentation |
| ISR | Mix of static and dynamic |
---
## 6. Cloud Provider Selection
### Provider Comparison
| Factor | AWS | GCP | Azure |
|--------|-----|-----|-------|
| Market share | Largest | Growing | Enterprise strong |
| Service breadth | Most comprehensive | Strong ML/data | Best Microsoft integration |
| Pricing | Complex, volume discounts | Simpler, sustained use | EA discounts |
| Kubernetes | EKS | GKE (best managed) | AKS |
| Serverless | Lambda (mature) | Cloud Functions | Azure Functions |
| Database | RDS, DynamoDB | Cloud SQL, Spanner | SQL, Cosmos |
### Decision Factors
| If You Need | Consider |
|-------------|----------|
| Microsoft ecosystem | Azure |
| Best Kubernetes experience | GCP |
| Widest service selection | AWS |
| Machine learning focus | GCP or AWS |
| Government compliance | AWS GovCloud or Azure Gov |
| Startup credits | All offer programs |
### Multi-Cloud Considerations
**Go multi-cloud when:**
- Regulatory requirements mandate it
- Specific service (e.g., GCP BigQuery) is best-in-class
- Negotiating leverage with vendors
**Stay single-cloud when:**
- Team is small
- Want to minimize complexity
- Deep integration needed
### Service Mapping
| Need | AWS | GCP | Azure |
|------|-----|-----|-------|
| Compute | EC2 | Compute Engine | Virtual Machines |
| Containers | ECS, EKS | GKE, Cloud Run | AKS, Container Apps |
| Serverless | Lambda | Cloud Functions | Azure Functions |
| Object Storage | S3 | Cloud Storage | Blob Storage |
| SQL Database | RDS | Cloud SQL | Azure SQL |
| NoSQL | DynamoDB | Firestore | Cosmos DB |
| CDN | CloudFront | Cloud CDN | Azure CDN |
| DNS | Route 53 | Cloud DNS | Azure DNS |
---
## 7. API Style Selection
### REST vs GraphQL vs gRPC
| Factor | REST | GraphQL | gRPC |
|--------|------|---------|------|
| Use case | General purpose | Flexible queries | Microservices |
| Learning curve | Low | Medium | High |
| Over-fetching | Common | Solved | N/A |
| Caching | HTTP native | Complex | Custom |
| Browser support | Native | Native | Limited |
| Tooling | Mature | Growing | Strong |
| Performance | Good | Good | Excellent |
### Decision Matrix
| Requirement | Recommendation |
|-------------|----------------|
| Public API | REST |
| Mobile apps with varied needs | GraphQL |
| Microservices communication | gRPC |
| Real-time updates | GraphQL subscriptions or WebSocket |
| File uploads | REST |
| Internal services only | gRPC |
| Third-party developers | REST + OpenAPI |
### When to Choose Each
**Choose REST when:**
- Building public APIs
- Need HTTP caching
- Simple CRUD operations
- Team experienced with REST
**Choose GraphQL when:**
- Multiple clients with different data needs
- Rapid frontend iteration
- Complex, nested data relationships
- Want to reduce API calls
**Choose gRPC when:**
- Service-to-service communication
- Performance critical
- Streaming required
- Strong typing important
### API Versioning Strategies
| Strategy | Pros | Cons |
|----------|------|------|
| URL path (`/v1/`) | Clear, easy to implement | URL pollution |
| Query param (`?version=1`) | Flexible | Easy to miss |
| Header (`Accept-Version: 1`) | Clean URLs | Less discoverable |
| No versioning (evolve) | Simple | Breaking changes risky |
**Recommendation:** URL path versioning for public APIs, header versioning for internal.
---
## Quick Reference
| Decision | Default Choice | Alternative When |
|----------|----------------|------------------|
| Database | PostgreSQL | Scale/flexibility → MongoDB, DynamoDB |
| Cache | Redis | Simple needs → Memcached |
| Queue | SQS (AWS) / RabbitMQ | Event streaming → Kafka |
| Auth | JWT + Refresh | Traditional web → Sessions |
| Frontend | React + Next.js | Simplicity → Vue, Performance → Svelte |
| Cloud | AWS | Microsoft shop → Azure, ML-first → GCP |
| API | REST | Mobile flexibility → GraphQL, Internal → gRPC |

View File

@@ -1,81 +1,598 @@
#!/usr/bin/env python3
"""
Architecture Diagram Generator
Automated tool for senior architect tasks
Generates architecture diagrams from project structure in multiple formats:
- Mermaid (default)
- PlantUML
- ASCII
Supports diagram types:
- component: Shows modules and their relationships
- layer: Shows architectural layers
- deployment: Shows deployment topology
"""
import os
import sys
import json
import argparse
import re
from pathlib import Path
from typing import Dict, List, Optional
from typing import Dict, List, Set, Tuple, Optional
from collections import defaultdict
class ArchitectureDiagramGenerator:
"""Main class for architecture diagram generator functionality"""
def __init__(self, target_path: str, verbose: bool = False):
self.target_path = Path(target_path)
self.verbose = verbose
self.results = {}
def run(self) -> Dict:
"""Execute the main functionality"""
print(f"🚀 Running {self.__class__.__name__}...")
print(f"📁 Target: {self.target_path}")
class ProjectScanner:
"""Scans project structure to detect components and relationships."""
# Common architectural layer patterns
LAYER_PATTERNS = {
'presentation': ['controller', 'handler', 'view', 'page', 'component', 'ui'],
'api': ['api', 'route', 'endpoint', 'rest', 'graphql'],
'business': ['service', 'usecase', 'domain', 'logic', 'core'],
'data': ['repository', 'dao', 'model', 'entity', 'schema', 'migration'],
'infrastructure': ['config', 'util', 'helper', 'middleware', 'plugin'],
}
# File patterns for different technologies
TECH_PATTERNS = {
'react': ['jsx', 'tsx', 'package.json'],
'vue': ['vue', 'nuxt.config'],
'angular': ['component.ts', 'module.ts', 'angular.json'],
'node': ['package.json', 'express', 'fastify'],
'python': ['requirements.txt', 'pyproject.toml', 'setup.py'],
'go': ['go.mod', 'go.sum'],
'rust': ['Cargo.toml'],
'java': ['pom.xml', 'build.gradle'],
'docker': ['Dockerfile', 'docker-compose'],
'kubernetes': ['deployment.yaml', 'service.yaml', 'k8s'],
}
def __init__(self, project_path: Path):
self.project_path = project_path
self.components: Dict[str, Dict] = {}
self.relationships: List[Tuple[str, str, str]] = [] # (from, to, type)
self.layers: Dict[str, List[str]] = defaultdict(list)
self.technologies: Set[str] = set()
self.external_deps: Set[str] = set()
def scan(self) -> Dict:
"""Scan the project and return structure information."""
self._scan_directories()
self._detect_technologies()
self._detect_relationships()
self._classify_layers()
return {
'components': self.components,
'relationships': self.relationships,
'layers': dict(self.layers),
'technologies': list(self.technologies),
'external_deps': list(self.external_deps),
}
def _scan_directories(self):
"""Scan directory structure for components."""
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', '.nuxt', 'coverage', '.pytest_cache'}
for item in self.project_path.iterdir():
if item.is_dir() and item.name not in ignore_dirs and not item.name.startswith('.'):
component_info = self._analyze_directory(item)
if component_info['files'] > 0:
self.components[item.name] = component_info
def _analyze_directory(self, dir_path: Path) -> Dict:
"""Analyze a directory to understand its role."""
files = list(dir_path.rglob('*'))
code_files = [f for f in files if f.is_file() and f.suffix in
['.py', '.js', '.ts', '.jsx', '.tsx', '.go', '.rs', '.java', '.vue']]
# Count imports/dependencies within the directory
imports = set()
for f in code_files[:50]: # Limit to avoid large projects
imports.update(self._extract_imports(f))
return {
'path': str(dir_path.relative_to(self.project_path)),
'files': len(code_files),
'imports': list(imports)[:20], # Top 20 imports
'type': self._guess_component_type(dir_path.name),
}
def _extract_imports(self, file_path: Path) -> Set[str]:
"""Extract import statements from a file."""
imports = set()
try:
self.validate_target()
self.analyze()
self.generate_report()
print("✅ Completed successfully!")
return self.results
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
def validate_target(self):
"""Validate the target path exists and is accessible"""
if not self.target_path.exists():
raise ValueError(f"Target path does not exist: {self.target_path}")
if self.verbose:
print(f"✓ Target validated: {self.target_path}")
def analyze(self):
"""Perform the main analysis or operation"""
if self.verbose:
print("📊 Analyzing...")
# Main logic here
self.results['status'] = 'success'
self.results['target'] = str(self.target_path)
self.results['findings'] = []
# Add analysis results
if self.verbose:
print(f"✓ Analysis complete: {len(self.results.get('findings', []))} findings")
def generate_report(self):
"""Generate and display the report"""
print("\n" + "="*50)
print("REPORT")
print("="*50)
print(f"Target: {self.results.get('target')}")
print(f"Status: {self.results.get('status')}")
print(f"Findings: {len(self.results.get('findings', []))}")
print("="*50 + "\n")
content = file_path.read_text(encoding='utf-8', errors='ignore')
# Python imports
py_imports = re.findall(r'^(?:from|import)\s+([\w.]+)', content, re.MULTILINE)
imports.update(py_imports)
# JS/TS imports
js_imports = re.findall(r'(?:import|require)\s*\(?[\'"]([^\'"\s]+)[\'"]', content)
imports.update(js_imports)
# Go imports
go_imports = re.findall(r'import\s+(?:\(\s*)?["\']([^"\']+)["\']', content)
imports.update(go_imports)
except Exception:
pass
return imports
def _guess_component_type(self, name: str) -> str:
"""Guess component type from directory name."""
name_lower = name.lower()
for layer, patterns in self.LAYER_PATTERNS.items():
for pattern in patterns:
if pattern in name_lower:
return layer
return 'unknown'
def _detect_technologies(self):
"""Detect technologies used in the project."""
for tech, patterns in self.TECH_PATTERNS.items():
for pattern in patterns:
matches = list(self.project_path.rglob(f'*{pattern}*'))
if matches:
self.technologies.add(tech)
break
# Detect external dependencies from package files
self._parse_package_json()
self._parse_requirements_txt()
self._parse_go_mod()
def _parse_package_json(self):
"""Parse package.json for dependencies."""
pkg_path = self.project_path / 'package.json'
if pkg_path.exists():
try:
data = json.loads(pkg_path.read_text())
deps = list(data.get('dependencies', {}).keys())[:10]
self.external_deps.update(deps)
except Exception:
pass
def _parse_requirements_txt(self):
"""Parse requirements.txt for dependencies."""
req_path = self.project_path / 'requirements.txt'
if req_path.exists():
try:
content = req_path.read_text()
deps = re.findall(r'^([a-zA-Z0-9_-]+)', content, re.MULTILINE)[:10]
self.external_deps.update(deps)
except Exception:
pass
def _parse_go_mod(self):
"""Parse go.mod for dependencies."""
mod_path = self.project_path / 'go.mod'
if mod_path.exists():
try:
content = mod_path.read_text()
deps = re.findall(r'^\s+([^\s]+)\s+v', content, re.MULTILINE)[:10]
self.external_deps.update([d.split('/')[-1] for d in deps])
except Exception:
pass
def _detect_relationships(self):
"""Detect relationships between components."""
component_names = set(self.components.keys())
for comp_name, comp_info in self.components.items():
for imp in comp_info.get('imports', []):
# Check if import references another component
for other_comp in component_names:
if other_comp != comp_name and other_comp.lower() in imp.lower():
self.relationships.append((comp_name, other_comp, 'uses'))
def _classify_layers(self):
"""Classify components into architectural layers."""
for comp_name, comp_info in self.components.items():
layer = comp_info.get('type', 'unknown')
if layer != 'unknown':
self.layers[layer].append(comp_name)
else:
self.layers['other'].append(comp_name)
class DiagramGenerator:
"""Base class for diagram generators."""
def __init__(self, scan_result: Dict):
self.components = scan_result['components']
self.relationships = scan_result['relationships']
self.layers = scan_result['layers']
self.technologies = scan_result['technologies']
self.external_deps = scan_result['external_deps']
def generate(self, diagram_type: str) -> str:
"""Generate diagram based on type."""
if diagram_type == 'component':
return self._generate_component_diagram()
elif diagram_type == 'layer':
return self._generate_layer_diagram()
elif diagram_type == 'deployment':
return self._generate_deployment_diagram()
else:
return self._generate_component_diagram()
def _generate_component_diagram(self) -> str:
raise NotImplementedError
def _generate_layer_diagram(self) -> str:
raise NotImplementedError
def _generate_deployment_diagram(self) -> str:
raise NotImplementedError
class MermaidGenerator(DiagramGenerator):
"""Generate Mermaid diagrams."""
def _generate_component_diagram(self) -> str:
lines = ['graph TD']
# Add components
for name, info in self.components.items():
safe_name = self._safe_id(name)
file_count = info.get('files', 0)
lines.append(f' {safe_name}["{name}<br/>{file_count} files"]')
# Add relationships
seen = set()
for src, dst, rel_type in self.relationships:
key = (src, dst)
if key not in seen:
seen.add(key)
lines.append(f' {self._safe_id(src)} --> {self._safe_id(dst)}')
# Add external dependencies if any
if self.external_deps:
lines.append('')
lines.append(' subgraph External')
for dep in list(self.external_deps)[:5]:
safe_dep = self._safe_id(dep)
lines.append(f' {safe_dep}(("{dep}"))')
lines.append(' end')
return '\n'.join(lines)
def _generate_layer_diagram(self) -> str:
lines = ['graph TB']
layer_order = ['presentation', 'api', 'business', 'data', 'infrastructure', 'other']
for layer in layer_order:
components = self.layers.get(layer, [])
if components:
lines.append(f' subgraph {layer.title()} Layer')
for comp in components:
safe_comp = self._safe_id(comp)
lines.append(f' {safe_comp}["{comp}"]')
lines.append(' end')
lines.append('')
# Add layer relationships (top-down)
prev_layer = None
for layer in layer_order:
if self.layers.get(layer):
if prev_layer and self.layers.get(prev_layer):
first_prev = self._safe_id(self.layers[prev_layer][0])
first_curr = self._safe_id(self.layers[layer][0])
lines.append(f' {first_prev} -.-> {first_curr}')
prev_layer = layer
return '\n'.join(lines)
def _generate_deployment_diagram(self) -> str:
lines = ['graph LR']
# Client
lines.append(' subgraph Client')
lines.append(' browser["Browser/Mobile"]')
lines.append(' end')
lines.append('')
# Determine if we have typical deployment components
has_api = any('api' in t for t in self.technologies)
has_docker = 'docker' in self.technologies
has_k8s = 'kubernetes' in self.technologies
# Application tier
lines.append(' subgraph Application')
if has_k8s:
lines.append(' k8s["Kubernetes Cluster"]')
elif has_docker:
lines.append(' docker["Docker Container"]')
else:
lines.append(' app["Application Server"]')
lines.append(' end')
lines.append('')
# Data tier
lines.append(' subgraph Data')
lines.append(' db[("Database")]')
if self.external_deps:
lines.append(' cache[("Cache")]')
lines.append(' end')
lines.append('')
# Connections
if has_k8s:
lines.append(' browser --> k8s')
lines.append(' k8s --> db')
elif has_docker:
lines.append(' browser --> docker')
lines.append(' docker --> db')
else:
lines.append(' browser --> app')
lines.append(' app --> db')
return '\n'.join(lines)
def _safe_id(self, name: str) -> str:
"""Convert name to safe Mermaid ID."""
return re.sub(r'[^a-zA-Z0-9]', '_', name)
class PlantUMLGenerator(DiagramGenerator):
"""Generate PlantUML diagrams."""
def _generate_component_diagram(self) -> str:
lines = ['@startuml', 'skinparam componentStyle rectangle', '']
# Add components
for name, info in self.components.items():
file_count = info.get('files', 0)
lines.append(f'component "{name}\\n({file_count} files)" as {self._safe_id(name)}')
lines.append('')
# Add relationships
seen = set()
for src, dst, rel_type in self.relationships:
key = (src, dst)
if key not in seen:
seen.add(key)
lines.append(f'{self._safe_id(src)} --> {self._safe_id(dst)}')
# External dependencies
if self.external_deps:
lines.append('')
lines.append('package "External Dependencies" {')
for dep in list(self.external_deps)[:5]:
lines.append(f' [{dep}]')
lines.append('}')
lines.append('')
lines.append('@enduml')
return '\n'.join(lines)
def _generate_layer_diagram(self) -> str:
lines = ['@startuml', 'skinparam packageStyle rectangle', '']
layer_order = ['presentation', 'api', 'business', 'data', 'infrastructure', 'other']
for layer in layer_order:
components = self.layers.get(layer, [])
if components:
lines.append(f'package "{layer.title()} Layer" {{')
for comp in components:
lines.append(f' [{comp}]')
lines.append('}')
lines.append('')
lines.append('@enduml')
return '\n'.join(lines)
def _generate_deployment_diagram(self) -> str:
lines = ['@startuml', '']
lines.append('node "Client" {')
lines.append(' [Browser/Mobile] as browser')
lines.append('}')
lines.append('')
has_docker = 'docker' in self.technologies
has_k8s = 'kubernetes' in self.technologies
lines.append('node "Application Server" {')
if has_k8s:
lines.append(' [Kubernetes Cluster] as app')
elif has_docker:
lines.append(' [Docker Container] as app')
else:
lines.append(' [Application] as app')
lines.append('}')
lines.append('')
lines.append('database "Data Store" {')
lines.append(' [Database] as db')
lines.append('}')
lines.append('')
lines.append('browser --> app')
lines.append('app --> db')
lines.append('')
lines.append('@enduml')
return '\n'.join(lines)
def _safe_id(self, name: str) -> str:
"""Convert name to safe PlantUML ID."""
return re.sub(r'[^a-zA-Z0-9]', '_', name)
class ASCIIGenerator(DiagramGenerator):
"""Generate ASCII diagrams."""
def _generate_component_diagram(self) -> str:
lines = []
lines.append('=' * 60)
lines.append('COMPONENT DIAGRAM')
lines.append('=' * 60)
lines.append('')
# Components
lines.append('Components:')
lines.append('-' * 40)
for name, info in self.components.items():
file_count = info.get('files', 0)
comp_type = info.get('type', 'unknown')
lines.append(f' [{name}]')
lines.append(f' Files: {file_count}')
lines.append(f' Type: {comp_type}')
lines.append('')
# Relationships
if self.relationships:
lines.append('Relationships:')
lines.append('-' * 40)
seen = set()
for src, dst, rel_type in self.relationships:
key = (src, dst)
if key not in seen:
seen.add(key)
lines.append(f' {src} --> {dst}')
lines.append('')
# External dependencies
if self.external_deps:
lines.append('External Dependencies:')
lines.append('-' * 40)
for dep in list(self.external_deps)[:10]:
lines.append(f' - {dep}')
lines.append('')
lines.append('=' * 60)
return '\n'.join(lines)
def _generate_layer_diagram(self) -> str:
lines = []
lines.append('=' * 60)
lines.append('LAYERED ARCHITECTURE')
lines.append('=' * 60)
lines.append('')
layer_order = ['presentation', 'api', 'business', 'data', 'infrastructure', 'other']
for layer in layer_order:
components = self.layers.get(layer, [])
if components:
lines.append(f'+{"-" * 56}+')
lines.append(f'| {layer.upper():^54} |')
lines.append(f'+{"-" * 56}+')
for comp in components:
lines.append(f'| [{comp:^48}] |')
lines.append(f'+{"-" * 56}+')
lines.append(' |')
lines.append(' v')
# Remove last arrow
if lines[-2:] == [' |', ' v']:
lines = lines[:-2]
lines.append('')
lines.append('=' * 60)
return '\n'.join(lines)
def _generate_deployment_diagram(self) -> str:
lines = []
lines.append('=' * 60)
lines.append('DEPLOYMENT DIAGRAM')
lines.append('=' * 60)
lines.append('')
has_docker = 'docker' in self.technologies
has_k8s = 'kubernetes' in self.technologies
# Client tier
lines.append('+----------------------+')
lines.append('| CLIENT |')
lines.append('| [Browser/Mobile] |')
lines.append('+----------+-----------+')
lines.append(' |')
lines.append(' v')
# Application tier
lines.append('+----------------------+')
lines.append('| APPLICATION |')
if has_k8s:
lines.append('| [Kubernetes Cluster] |')
elif has_docker:
lines.append('| [Docker Container] |')
else:
lines.append('| [App Server] |')
lines.append('+----------+-----------+')
lines.append(' |')
lines.append(' v')
# Data tier
lines.append('+----------------------+')
lines.append('| DATA |')
lines.append('| [(Database)] |')
lines.append('+----------------------+')
lines.append('')
# Technologies detected
if self.technologies:
lines.append('Technologies detected:')
lines.append('-' * 40)
for tech in sorted(self.technologies):
lines.append(f' - {tech}')
lines.append('')
lines.append('=' * 60)
return '\n'.join(lines)
def main():
"""Main entry point"""
parser = argparse.ArgumentParser(
description="Architecture Diagram Generator"
description='Generate architecture diagrams from project structure',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog='''
Examples:
%(prog)s ./my-project --format mermaid
%(prog)s ./my-project --format plantuml --type layer
%(prog)s ./my-project --format ascii -o architecture.txt
Diagram types:
component - Shows modules and their relationships (default)
layer - Shows architectural layers
deployment - Shows deployment topology
Output formats:
mermaid - Mermaid.js format (default)
plantuml - PlantUML format
ascii - ASCII art format
'''
)
parser.add_argument(
'project_path',
help='Path to the project directory'
)
parser.add_argument(
'target',
help='Target path to analyze or process'
'--format', '-f',
choices=['mermaid', 'plantuml', 'ascii'],
default='mermaid',
help='Output format (default: mermaid)'
)
parser.add_argument(
'--type', '-t',
choices=['component', 'layer', 'deployment'],
default='component',
help='Diagram type (default: component)'
)
parser.add_argument(
'--output', '-o',
help='Output file path (prints to stdout if not specified)'
)
parser.add_argument(
'--verbose', '-v',
@@ -85,30 +602,59 @@ def main():
parser.add_argument(
'--json',
action='store_true',
help='Output results as JSON'
help='Output raw scan results as JSON'
)
parser.add_argument(
'--output', '-o',
help='Output file path'
)
args = parser.parse_args()
tool = ArchitectureDiagramGenerator(
args.target,
verbose=args.verbose
)
results = tool.run()
project_path = Path(args.project_path).resolve()
if not project_path.exists():
print(f"Error: Project path does not exist: {project_path}", file=sys.stderr)
sys.exit(1)
if not project_path.is_dir():
print(f"Error: Project path is not a directory: {project_path}", file=sys.stderr)
sys.exit(1)
if args.verbose:
print(f"Scanning project: {project_path}")
# Scan project
scanner = ProjectScanner(project_path)
scan_result = scanner.scan()
if args.verbose:
print(f"Found {len(scan_result['components'])} components")
print(f"Found {len(scan_result['relationships'])} relationships")
print(f"Technologies: {', '.join(scan_result['technologies']) or 'none detected'}")
# Output raw JSON if requested
if args.json:
output = json.dumps(results, indent=2)
output = json.dumps(scan_result, indent=2)
if args.output:
with open(args.output, 'w') as f:
f.write(output)
Path(args.output).write_text(output)
print(f"Results written to {args.output}")
else:
print(output)
return
# Generate diagram
generators = {
'mermaid': MermaidGenerator,
'plantuml': PlantUMLGenerator,
'ascii': ASCIIGenerator,
}
generator = generators[args.format](scan_result)
diagram = generator.generate(args.type)
# Output
if args.output:
Path(args.output).write_text(diagram)
print(f"Diagram written to {args.output}")
else:
print(diagram)
if __name__ == '__main__':
main()

View File

@@ -1,81 +1,557 @@
#!/usr/bin/env python3
"""
Dependency Analyzer
Automated tool for senior architect tasks
Analyzes project dependencies for:
- Dependency tree (direct and transitive)
- Circular dependencies between modules
- Coupling score (0-100)
- Outdated packages (basic detection)
Supports:
- npm/yarn (package.json)
- Python (requirements.txt, pyproject.toml)
- Go (go.mod)
- Rust (Cargo.toml)
"""
import os
import sys
import json
import argparse
import re
from pathlib import Path
from typing import Dict, List, Optional
from typing import Dict, List, Set, Tuple, Optional
from collections import defaultdict
class DependencyAnalyzer:
"""Main class for dependency analyzer functionality"""
def __init__(self, target_path: str, verbose: bool = False):
self.target_path = Path(target_path)
"""Analyzes project dependencies and module coupling."""
def __init__(self, project_path: Path, verbose: bool = False):
self.project_path = project_path
self.verbose = verbose
self.results = {}
def run(self) -> Dict:
"""Execute the main functionality"""
print(f"🚀 Running {self.__class__.__name__}...")
print(f"📁 Target: {self.target_path}")
# Results
self.direct_deps: Dict[str, str] = {} # name -> version
self.dev_deps: Dict[str, str] = {}
self.internal_modules: Dict[str, Set[str]] = defaultdict(set) # module -> imports
self.circular_deps: List[List[str]] = []
self.coupling_score: float = 0
self.issues: List[Dict] = []
self.recommendations: List[str] = []
self.package_manager: Optional[str] = None
def analyze(self) -> Dict:
"""Run full dependency analysis."""
self._detect_package_manager()
self._parse_dependencies()
self._scan_internal_modules()
self._detect_circular_dependencies()
self._calculate_coupling_score()
self._generate_recommendations()
return self._build_report()
def _detect_package_manager(self):
"""Detect which package manager is used."""
if (self.project_path / 'package.json').exists():
self.package_manager = 'npm'
elif (self.project_path / 'requirements.txt').exists():
self.package_manager = 'pip'
elif (self.project_path / 'pyproject.toml').exists():
self.package_manager = 'poetry'
elif (self.project_path / 'go.mod').exists():
self.package_manager = 'go'
elif (self.project_path / 'Cargo.toml').exists():
self.package_manager = 'cargo'
else:
self.package_manager = 'unknown'
if self.verbose:
print(f"Detected package manager: {self.package_manager}")
def _parse_dependencies(self):
"""Parse dependencies based on detected package manager."""
parsers = {
'npm': self._parse_npm,
'pip': self._parse_pip,
'poetry': self._parse_poetry,
'go': self._parse_go,
'cargo': self._parse_cargo,
}
parser = parsers.get(self.package_manager)
if parser:
parser()
def _parse_npm(self):
"""Parse package.json for npm dependencies."""
pkg_path = self.project_path / 'package.json'
try:
self.validate_target()
self.analyze()
self.generate_report()
print("✅ Completed successfully!")
return self.results
data = json.loads(pkg_path.read_text())
# Direct dependencies
for name, version in data.get('dependencies', {}).items():
self.direct_deps[name] = self._clean_version(version)
# Dev dependencies
for name, version in data.get('devDependencies', {}).items():
self.dev_deps[name] = self._clean_version(version)
if self.verbose:
print(f"Found {len(self.direct_deps)} direct deps, "
f"{len(self.dev_deps)} dev deps")
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
def validate_target(self):
"""Validate the target path exists and is accessible"""
if not self.target_path.exists():
raise ValueError(f"Target path does not exist: {self.target_path}")
self.issues.append({
'type': 'parse_error',
'severity': 'error',
'message': f"Failed to parse package.json: {e}"
})
def _parse_pip(self):
"""Parse requirements.txt for Python dependencies."""
req_path = self.project_path / 'requirements.txt'
try:
content = req_path.read_text()
for line in content.strip().split('\n'):
line = line.strip()
if not line or line.startswith('#') or line.startswith('-'):
continue
# Parse name and version
match = re.match(r'^([a-zA-Z0-9_-]+)(?:[=<>!~]+(.+))?', line)
if match:
name = match.group(1)
version = match.group(2) or 'any'
self.direct_deps[name] = version
if self.verbose:
print(f"Found {len(self.direct_deps)} dependencies")
except Exception as e:
self.issues.append({
'type': 'parse_error',
'severity': 'error',
'message': f"Failed to parse requirements.txt: {e}"
})
def _parse_poetry(self):
"""Parse pyproject.toml for Poetry dependencies."""
toml_path = self.project_path / 'pyproject.toml'
try:
content = toml_path.read_text()
# Simple TOML parsing for dependencies section
in_deps = False
in_dev_deps = False
for line in content.split('\n'):
line = line.strip()
if line == '[tool.poetry.dependencies]':
in_deps = True
in_dev_deps = False
continue
elif line == '[tool.poetry.dev-dependencies]' or \
line == '[tool.poetry.group.dev.dependencies]':
in_deps = False
in_dev_deps = True
continue
elif line.startswith('['):
in_deps = False
in_dev_deps = False
continue
if (in_deps or in_dev_deps) and '=' in line:
match = re.match(r'^([a-zA-Z0-9_-]+)\s*=\s*["\']?([^"\']+)', line)
if match:
name = match.group(1)
version = match.group(2)
if name != 'python':
if in_deps:
self.direct_deps[name] = version
else:
self.dev_deps[name] = version
if self.verbose:
print(f"Found {len(self.direct_deps)} direct deps, "
f"{len(self.dev_deps)} dev deps")
except Exception as e:
self.issues.append({
'type': 'parse_error',
'severity': 'error',
'message': f"Failed to parse pyproject.toml: {e}"
})
def _parse_go(self):
"""Parse go.mod for Go dependencies."""
mod_path = self.project_path / 'go.mod'
try:
content = mod_path.read_text()
# Find require block
in_require = False
for line in content.split('\n'):
line = line.strip()
if line.startswith('require ('):
in_require = True
continue
elif line == ')' and in_require:
in_require = False
continue
elif line.startswith('require ') and '(' not in line:
# Single-line require
match = re.match(r'require\s+([^\s]+)\s+([^\s]+)', line)
if match:
self.direct_deps[match.group(1)] = match.group(2)
continue
if in_require:
match = re.match(r'([^\s]+)\s+([^\s]+)', line)
if match:
self.direct_deps[match.group(1)] = match.group(2)
if self.verbose:
print(f"Found {len(self.direct_deps)} dependencies")
except Exception as e:
self.issues.append({
'type': 'parse_error',
'severity': 'error',
'message': f"Failed to parse go.mod: {e}"
})
def _parse_cargo(self):
"""Parse Cargo.toml for Rust dependencies."""
cargo_path = self.project_path / 'Cargo.toml'
try:
content = cargo_path.read_text()
in_deps = False
in_dev_deps = False
for line in content.split('\n'):
line = line.strip()
if line == '[dependencies]':
in_deps = True
in_dev_deps = False
continue
elif line == '[dev-dependencies]':
in_deps = False
in_dev_deps = True
continue
elif line.startswith('['):
in_deps = False
in_dev_deps = False
continue
if (in_deps or in_dev_deps) and '=' in line:
match = re.match(r'^([a-zA-Z0-9_-]+)\s*=\s*["\']?([^"\']+)', line)
if match:
name = match.group(1)
version = match.group(2)
if in_deps:
self.direct_deps[name] = version
else:
self.dev_deps[name] = version
if self.verbose:
print(f"Found {len(self.direct_deps)} direct deps, "
f"{len(self.dev_deps)} dev deps")
except Exception as e:
self.issues.append({
'type': 'parse_error',
'severity': 'error',
'message': f"Failed to parse Cargo.toml: {e}"
})
def _clean_version(self, version: str) -> str:
"""Clean version string."""
return version.lstrip('^~>=<!')
def _scan_internal_modules(self):
"""Scan internal module imports for coupling analysis."""
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
# Find all code files
extensions = ['.py', '.js', '.ts', '.jsx', '.tsx', '.go', '.rs']
for ext in extensions:
for file_path in self.project_path.rglob(f'*{ext}'):
# Skip ignored directories
if any(ignored in file_path.parts for ignored in ignore_dirs):
continue
# Get module name (directory relative to project root)
try:
rel_path = file_path.relative_to(self.project_path)
module = rel_path.parts[0] if len(rel_path.parts) > 1 else 'root'
# Extract imports
imports = self._extract_imports(file_path)
self.internal_modules[module].update(imports)
except Exception:
continue
if self.verbose:
print(f"✓ Target validated: {self.target_path}")
def analyze(self):
"""Perform the main analysis or operation"""
print(f"Scanned {len(self.internal_modules)} internal modules")
def _extract_imports(self, file_path: Path) -> Set[str]:
"""Extract import statements from a file."""
imports = set()
try:
content = file_path.read_text(encoding='utf-8', errors='ignore')
# Python imports
for match in re.finditer(r'^(?:from|import)\s+([\w.]+)', content, re.MULTILINE):
imports.add(match.group(1).split('.')[0])
# JS/TS imports
for match in re.finditer(r'(?:import|require)\s*\(?[\'"]([^\'"\s]+)[\'"]', content):
imp = match.group(1)
if imp.startswith('.') or imp.startswith('@/') or imp.startswith('~/'):
# Relative import - extract first path component
parts = imp.lstrip('./~@').split('/')
if parts:
imports.add(parts[0])
except Exception:
pass
return imports
def _detect_circular_dependencies(self):
"""Detect circular dependencies between internal modules."""
# Build dependency graph
graph = defaultdict(set)
modules = set(self.internal_modules.keys())
for module, imports in self.internal_modules.items():
for imp in imports:
# Check if import is an internal module
for internal_module in modules:
if internal_module.lower() in imp.lower() and internal_module != module:
graph[module].add(internal_module)
# Find cycles using DFS
visited = set()
rec_stack = set()
cycles = []
def find_cycles(node: str, path: List[str]):
visited.add(node)
rec_stack.add(node)
path.append(node)
for neighbor in graph.get(node, []):
if neighbor not in visited:
find_cycles(neighbor, path)
elif neighbor in rec_stack:
# Found cycle
cycle_start = path.index(neighbor)
cycle = path[cycle_start:] + [neighbor]
if cycle not in cycles:
cycles.append(cycle)
path.pop()
rec_stack.remove(node)
for module in modules:
if module not in visited:
find_cycles(module, [])
self.circular_deps = cycles
if cycles:
for cycle in cycles:
self.issues.append({
'type': 'circular_dependency',
'severity': 'warning',
'message': f"Circular dependency: {' -> '.join(cycle)}"
})
if self.verbose:
print("📊 Analyzing...")
# Main logic here
self.results['status'] = 'success'
self.results['target'] = str(self.target_path)
self.results['findings'] = []
# Add analysis results
print(f"Found {len(self.circular_deps)} circular dependencies")
def _calculate_coupling_score(self):
"""Calculate coupling score (0-100, lower is better)."""
if not self.internal_modules:
self.coupling_score = 0
return
# Count connections between modules
total_modules = len(self.internal_modules)
total_connections = 0
modules = set(self.internal_modules.keys())
for module, imports in self.internal_modules.items():
for imp in imports:
for internal_module in modules:
if internal_module.lower() in imp.lower() and internal_module != module:
total_connections += 1
# Max possible connections (complete graph)
max_connections = total_modules * (total_modules - 1) if total_modules > 1 else 1
# Coupling score as percentage of max connections
self.coupling_score = min(100, int((total_connections / max_connections) * 100))
# Add penalty for circular dependencies
self.coupling_score = min(100, self.coupling_score + len(self.circular_deps) * 10)
if self.verbose:
print(f"✓ Analysis complete: {len(self.results.get('findings', []))} findings")
def generate_report(self):
"""Generate and display the report"""
print("\n" + "="*50)
print("REPORT")
print("="*50)
print(f"Target: {self.results.get('target')}")
print(f"Status: {self.results.get('status')}")
print(f"Findings: {len(self.results.get('findings', []))}")
print("="*50 + "\n")
print(f"Coupling score: {self.coupling_score}/100")
def _generate_recommendations(self):
"""Generate actionable recommendations."""
# Circular dependency recommendations
if self.circular_deps:
self.recommendations.append(
"Extract shared interfaces or create a common module to break circular dependencies"
)
# High coupling recommendations
if self.coupling_score > 70:
self.recommendations.append(
"High coupling detected. Consider applying SOLID principles and "
"introducing abstraction layers"
)
# Too many dependencies
if len(self.direct_deps) > 50:
self.recommendations.append(
f"Large dependency count ({len(self.direct_deps)}). "
"Review for unused dependencies and consider bundle size impact"
)
# Check for known problematic packages (simplified check)
problematic = {
'lodash': 'Consider lodash-es or native methods for smaller bundle',
'moment': 'Consider day.js or date-fns for smaller bundle',
'request': 'Deprecated. Use axios, node-fetch, or native fetch',
}
for pkg, suggestion in problematic.items():
if pkg in self.direct_deps:
self.recommendations.append(f"{pkg}: {suggestion}")
def _build_report(self) -> Dict:
"""Build the analysis report."""
return {
'project_path': str(self.project_path),
'package_manager': self.package_manager,
'summary': {
'direct_dependencies': len(self.direct_deps),
'dev_dependencies': len(self.dev_deps),
'internal_modules': len(self.internal_modules),
'coupling_score': self.coupling_score,
'circular_dependencies': len(self.circular_deps),
'issues': len(self.issues),
},
'dependencies': {
'direct': self.direct_deps,
'dev': self.dev_deps,
},
'internal_modules': {k: list(v) for k, v in self.internal_modules.items()},
'circular_dependencies': self.circular_deps,
'issues': self.issues,
'recommendations': self.recommendations,
}
def print_human_report(report: Dict):
"""Print human-readable report."""
print("\n" + "=" * 60)
print("DEPENDENCY ANALYSIS REPORT")
print("=" * 60)
print(f"\nProject: {report['project_path']}")
print(f"Package Manager: {report['package_manager']}")
summary = report['summary']
print("\n--- Summary ---")
print(f"Direct dependencies: {summary['direct_dependencies']}")
print(f"Dev dependencies: {summary['dev_dependencies']}")
print(f"Internal modules: {summary['internal_modules']}")
print(f"Coupling score: {summary['coupling_score']}/100 ", end='')
if summary['coupling_score'] < 30:
print("(low - good)")
elif summary['coupling_score'] < 70:
print("(moderate)")
else:
print("(high - consider refactoring)")
if report['circular_dependencies']:
print(f"\n--- Circular Dependencies ({len(report['circular_dependencies'])}) ---")
for cycle in report['circular_dependencies']:
print(f" {' -> '.join(cycle)}")
if report['issues']:
print(f"\n--- Issues ({len(report['issues'])}) ---")
for issue in report['issues']:
severity = issue['severity'].upper()
print(f" [{severity}] {issue['message']}")
if report['recommendations']:
print(f"\n--- Recommendations ---")
for i, rec in enumerate(report['recommendations'], 1):
print(f" {i}. {rec}")
# Show top dependencies
deps = report['dependencies']['direct']
if deps:
print(f"\n--- Top Dependencies (of {len(deps)}) ---")
for name, version in list(deps.items())[:10]:
print(f" {name}: {version}")
if len(deps) > 10:
print(f" ... and {len(deps) - 10} more")
print("\n" + "=" * 60)
def main():
"""Main entry point"""
parser = argparse.ArgumentParser(
description="Dependency Analyzer"
description='Analyze project dependencies and module coupling',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog='''
Examples:
%(prog)s ./my-project
%(prog)s ./my-project --output json
%(prog)s ./my-project --check circular
%(prog)s ./my-project --verbose
Supported package managers:
- npm/yarn (package.json)
- pip (requirements.txt)
- poetry (pyproject.toml)
- go (go.mod)
- cargo (Cargo.toml)
'''
)
parser.add_argument(
'project_path',
help='Path to the project directory'
)
parser.add_argument(
'target',
help='Target path to analyze or process'
'--output', '-o',
choices=['human', 'json'],
default='human',
help='Output format (default: human)'
)
parser.add_argument(
'--check',
choices=['all', 'circular', 'coupling'],
default='all',
help='What to check (default: all)'
)
parser.add_argument(
'--verbose', '-v',
@@ -83,32 +559,57 @@ def main():
help='Enable verbose output'
)
parser.add_argument(
'--json',
action='store_true',
help='Output results as JSON'
'--save', '-s',
help='Save report to file'
)
parser.add_argument(
'--output', '-o',
help='Output file path'
)
args = parser.parse_args()
tool = DependencyAnalyzer(
args.target,
verbose=args.verbose
)
results = tool.run()
if args.json:
output = json.dumps(results, indent=2)
if args.output:
with open(args.output, 'w') as f:
f.write(output)
print(f"Results written to {args.output}")
project_path = Path(args.project_path).resolve()
if not project_path.exists():
print(f"Error: Project path does not exist: {project_path}", file=sys.stderr)
sys.exit(1)
if not project_path.is_dir():
print(f"Error: Project path is not a directory: {project_path}", file=sys.stderr)
sys.exit(1)
# Run analysis
analyzer = DependencyAnalyzer(project_path, verbose=args.verbose)
report = analyzer.analyze()
# Filter report based on --check option
if args.check == 'circular':
if report['circular_dependencies']:
print("Circular dependencies found:")
for cycle in report['circular_dependencies']:
print(f" {' -> '.join(cycle)}")
sys.exit(1)
else:
print("No circular dependencies found.")
sys.exit(0)
elif args.check == 'coupling':
score = report['summary']['coupling_score']
print(f"Coupling score: {score}/100")
if score > 70:
print("WARNING: High coupling detected")
sys.exit(1)
sys.exit(0)
# Output report
if args.output == 'json':
output = json.dumps(report, indent=2)
if args.save:
Path(args.save).write_text(output)
print(f"Report saved to {args.save}")
else:
print(output)
else:
print_human_report(report)
if args.save:
Path(args.save).write_text(json.dumps(report, indent=2))
print(f"\nJSON report saved to {args.save}")
if __name__ == '__main__':
main()

View File

@@ -1,81 +1,683 @@
#!/usr/bin/env python3
"""
Project Architect
Automated tool for senior architect tasks
Analyzes project structure and detects:
- Architectural patterns (MVC, layered, hexagonal, microservices)
- Code organization issues (god classes, mixed concerns)
- Layer violations
- Missing architectural components
Provides architecture assessment and improvement recommendations.
"""
import os
import sys
import json
import argparse
import re
from pathlib import Path
from typing import Dict, List, Optional
from typing import Dict, List, Set, Tuple, Optional
from collections import defaultdict
class PatternDetector:
"""Detects architectural patterns in a project."""
# Pattern signatures
PATTERNS = {
'layered': {
'indicators': ['controller', 'service', 'repository', 'dao', 'model', 'entity'],
'structure': ['controllers', 'services', 'repositories', 'models'],
'weight': 0,
},
'mvc': {
'indicators': ['model', 'view', 'controller'],
'structure': ['models', 'views', 'controllers'],
'weight': 0,
},
'hexagonal': {
'indicators': ['port', 'adapter', 'domain', 'infrastructure', 'application'],
'structure': ['ports', 'adapters', 'domain', 'infrastructure'],
'weight': 0,
},
'clean': {
'indicators': ['entity', 'usecase', 'interface', 'framework', 'adapter'],
'structure': ['entities', 'usecases', 'interfaces', 'frameworks'],
'weight': 0,
},
'microservices': {
'indicators': ['service', 'api', 'gateway', 'docker', 'kubernetes'],
'structure': ['services', 'api-gateway', 'docker-compose'],
'weight': 0,
},
'modular_monolith': {
'indicators': ['module', 'feature', 'bounded'],
'structure': ['modules', 'features'],
'weight': 0,
},
'feature_based': {
'indicators': ['feature', 'component', 'page'],
'structure': ['features', 'components', 'pages'],
'weight': 0,
},
}
# Layer definitions for violation detection
LAYER_HIERARCHY = {
'presentation': ['controller', 'handler', 'view', 'page', 'component', 'ui', 'route'],
'application': ['service', 'usecase', 'application', 'facade'],
'domain': ['domain', 'entity', 'model', 'aggregate', 'valueobject'],
'infrastructure': ['repository', 'dao', 'adapter', 'gateway', 'client', 'config'],
}
LAYER_ORDER = ['presentation', 'application', 'domain', 'infrastructure']
def __init__(self, project_path: Path):
self.project_path = project_path
self.directories: Set[str] = set()
self.files: Dict[str, List[str]] = defaultdict(list) # dir -> files
self.detected_pattern: Optional[str] = None
self.confidence: float = 0
self.layer_assignments: Dict[str, str] = {} # dir -> layer
def scan(self) -> Dict:
"""Scan project and detect patterns."""
self._scan_structure()
self._detect_pattern()
self._assign_layers()
return {
'detected_pattern': self.detected_pattern,
'confidence': self.confidence,
'directories': list(self.directories),
'layer_assignments': self.layer_assignments,
'pattern_scores': {p: d['weight'] for p, d in self.PATTERNS.items()},
}
def _scan_structure(self):
"""Scan directory structure."""
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage', '.pytest_cache'}
for item in self.project_path.iterdir():
if item.is_dir() and item.name not in ignore_dirs and not item.name.startswith('.'):
self.directories.add(item.name.lower())
# Scan files in directory
try:
for f in item.rglob('*'):
if f.is_file():
self.files[item.name.lower()].append(f.name.lower())
except PermissionError:
pass
def _detect_pattern(self):
"""Detect the primary architectural pattern."""
for pattern, config in self.PATTERNS.items():
score = 0
# Check directory structure
for struct in config['structure']:
if struct.lower() in self.directories:
score += 2
# Check indicator presence in directory names
for indicator in config['indicators']:
for dir_name in self.directories:
if indicator in dir_name:
score += 1
# Check file patterns
all_files = [f for files in self.files.values() for f in files]
for indicator in config['indicators']:
matching_files = sum(1 for f in all_files if indicator in f)
score += min(matching_files // 5, 3) # Cap contribution
config['weight'] = score
# Find best match
best_pattern = max(self.PATTERNS.items(), key=lambda x: x[1]['weight'])
if best_pattern[1]['weight'] > 3:
self.detected_pattern = best_pattern[0]
max_possible = len(best_pattern[1]['structure']) * 2 + len(best_pattern[1]['indicators']) * 2
self.confidence = min(100, int((best_pattern[1]['weight'] / max(max_possible, 1)) * 100))
else:
self.detected_pattern = 'unstructured'
self.confidence = 0
def _assign_layers(self):
"""Assign directories to architectural layers."""
for dir_name in self.directories:
for layer, indicators in self.LAYER_HIERARCHY.items():
for indicator in indicators:
if indicator in dir_name:
self.layer_assignments[dir_name] = layer
break
if dir_name in self.layer_assignments:
break
if dir_name not in self.layer_assignments:
self.layer_assignments[dir_name] = 'unknown'
class CodeAnalyzer:
"""Analyzes code for architectural issues."""
# Thresholds
MAX_FILE_LINES = 500
MAX_CLASS_LINES = 300
MAX_FUNCTION_LINES = 50
MAX_IMPORTS_PER_FILE = 30
def __init__(self, project_path: Path, verbose: bool = False):
self.project_path = project_path
self.verbose = verbose
self.issues: List[Dict] = []
self.metrics: Dict = {}
def analyze(self) -> Dict:
"""Run code analysis."""
self._analyze_file_sizes()
self._analyze_imports()
self._detect_god_classes()
self._check_naming_conventions()
return {
'issues': self.issues,
'metrics': self.metrics,
}
def _analyze_file_sizes(self):
"""Check for oversized files."""
extensions = ['.py', '.js', '.ts', '.jsx', '.tsx', '.go', '.rs', '.java']
large_files = []
total_lines = 0
file_count = 0
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
for ext in extensions:
for file_path in self.project_path.rglob(f'*{ext}'):
if any(ignored in file_path.parts for ignored in ignore_dirs):
continue
try:
content = file_path.read_text(encoding='utf-8', errors='ignore')
lines = len(content.split('\n'))
total_lines += lines
file_count += 1
if lines > self.MAX_FILE_LINES:
large_files.append({
'path': str(file_path.relative_to(self.project_path)),
'lines': lines,
})
self.issues.append({
'type': 'large_file',
'severity': 'warning',
'file': str(file_path.relative_to(self.project_path)),
'message': f"File has {lines} lines (threshold: {self.MAX_FILE_LINES})",
'suggestion': "Consider splitting into smaller, focused modules",
})
except Exception:
pass
self.metrics['total_lines'] = total_lines
self.metrics['file_count'] = file_count
self.metrics['avg_file_lines'] = total_lines // file_count if file_count > 0 else 0
self.metrics['large_files'] = large_files
def _analyze_imports(self):
"""Analyze import patterns."""
extensions = ['.py', '.js', '.ts', '.jsx', '.tsx']
high_import_files = []
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
for ext in extensions:
for file_path in self.project_path.rglob(f'*{ext}'):
if any(ignored in file_path.parts for ignored in ignore_dirs):
continue
try:
content = file_path.read_text(encoding='utf-8', errors='ignore')
# Count imports
py_imports = len(re.findall(r'^(?:from|import)\s+', content, re.MULTILINE))
js_imports = len(re.findall(r'^import\s+', content, re.MULTILINE))
imports = py_imports + js_imports
if imports > self.MAX_IMPORTS_PER_FILE:
high_import_files.append({
'path': str(file_path.relative_to(self.project_path)),
'imports': imports,
})
self.issues.append({
'type': 'high_imports',
'severity': 'info',
'file': str(file_path.relative_to(self.project_path)),
'message': f"File has {imports} imports (threshold: {self.MAX_IMPORTS_PER_FILE})",
'suggestion': "Consider if all imports are necessary or if the file has too many responsibilities",
})
except Exception:
pass
self.metrics['high_import_files'] = high_import_files
def _detect_god_classes(self):
"""Detect potential god classes (oversized classes)."""
extensions = ['.py', '.js', '.ts', '.java']
god_classes = []
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
for ext in extensions:
for file_path in self.project_path.rglob(f'*{ext}'):
if any(ignored in file_path.parts for ignored in ignore_dirs):
continue
try:
content = file_path.read_text(encoding='utf-8', errors='ignore')
lines = content.split('\n')
# Simple class detection
class_pattern = r'^\s*(?:export\s+)?(?:abstract\s+)?class\s+(\w+)'
in_class = False
class_name = None
class_start = 0
brace_count = 0
for i, line in enumerate(lines):
match = re.match(class_pattern, line)
if match:
if in_class and class_name:
# End previous class
class_lines = i - class_start
if class_lines > self.MAX_CLASS_LINES:
god_classes.append({
'file': str(file_path.relative_to(self.project_path)),
'class': class_name,
'lines': class_lines,
})
class_name = match.group(1)
class_start = i
in_class = True
# Check last class
if in_class and class_name:
class_lines = len(lines) - class_start
if class_lines > self.MAX_CLASS_LINES:
god_classes.append({
'file': str(file_path.relative_to(self.project_path)),
'class': class_name,
'lines': class_lines,
})
self.issues.append({
'type': 'god_class',
'severity': 'warning',
'file': str(file_path.relative_to(self.project_path)),
'message': f"Class '{class_name}' has ~{class_lines} lines (threshold: {self.MAX_CLASS_LINES})",
'suggestion': "Consider applying Single Responsibility Principle and splitting into smaller classes",
})
except Exception:
pass
self.metrics['god_classes'] = god_classes
def _check_naming_conventions(self):
"""Check for naming convention issues."""
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
naming_issues = []
# Check directory naming
for dir_path in self.project_path.rglob('*'):
if not dir_path.is_dir():
continue
if any(ignored in dir_path.parts for ignored in ignore_dirs):
continue
dir_name = dir_path.name
# Check for mixed case in directories (should be kebab-case or snake_case)
if re.search(r'[A-Z]', dir_name) and '-' not in dir_name and '_' not in dir_name:
rel_path = str(dir_path.relative_to(self.project_path))
if len(rel_path.split('/')) <= 3: # Only check top-level dirs
naming_issues.append({
'type': 'directory',
'path': rel_path,
'issue': 'PascalCase directory name',
})
if naming_issues:
self.issues.append({
'type': 'naming_convention',
'severity': 'info',
'message': f"Found {len(naming_issues)} naming convention inconsistencies",
'details': naming_issues[:5], # Show first 5
})
self.metrics['naming_issues'] = naming_issues
class LayerViolationDetector:
"""Detects architectural layer violations."""
LAYER_ORDER = ['presentation', 'application', 'domain', 'infrastructure']
# Valid dependency directions (key can depend on values)
VALID_DEPENDENCIES = {
'presentation': ['application', 'domain'],
'application': ['domain', 'infrastructure'],
'domain': [], # Domain should not depend on other layers
'infrastructure': ['domain'],
}
def __init__(self, project_path: Path, layer_assignments: Dict[str, str]):
self.project_path = project_path
self.layer_assignments = layer_assignments
self.violations: List[Dict] = []
def detect(self) -> List[Dict]:
"""Detect layer violations."""
self._analyze_imports()
return self.violations
def _analyze_imports(self):
"""Analyze imports for layer violations."""
extensions = ['.py', '.js', '.ts', '.jsx', '.tsx']
ignore_dirs = {'.git', 'node_modules', '__pycache__', '.venv', 'venv',
'dist', 'build', '.next', 'coverage'}
for ext in extensions:
for file_path in self.project_path.rglob(f'*{ext}'):
if any(ignored in file_path.parts for ignored in ignore_dirs):
continue
try:
rel_path = file_path.relative_to(self.project_path)
if len(rel_path.parts) < 2:
continue
source_dir = rel_path.parts[0].lower()
source_layer = self.layer_assignments.get(source_dir)
if not source_layer or source_layer == 'unknown':
continue
# Extract imports
content = file_path.read_text(encoding='utf-8', errors='ignore')
imports = self._extract_imports(content)
# Check each import for layer violations
for imp in imports:
target_dir = self._get_import_directory(imp)
if not target_dir:
continue
target_layer = self.layer_assignments.get(target_dir.lower())
if not target_layer or target_layer == 'unknown':
continue
if self._is_violation(source_layer, target_layer):
self.violations.append({
'type': 'layer_violation',
'severity': 'warning',
'file': str(rel_path),
'source_layer': source_layer,
'target_layer': target_layer,
'import': imp,
'message': f"{source_layer} layer should not depend on {target_layer} layer",
})
except Exception:
pass
def _extract_imports(self, content: str) -> List[str]:
"""Extract import statements."""
imports = []
# Python imports
imports.extend(re.findall(r'^(?:from|import)\s+([\w.]+)', content, re.MULTILINE))
# JS/TS imports
imports.extend(re.findall(r'(?:import|require)\s*\(?[\'"]([^\'"\s]+)[\'"]', content))
return imports
def _get_import_directory(self, imp: str) -> Optional[str]:
"""Get the directory from an import path."""
# Handle relative imports
if imp.startswith('.'):
return None # Skip relative imports
parts = imp.replace('@/', '').replace('~/', '').split('/')
if parts:
return parts[0].split('.')[0]
return None
def _is_violation(self, source_layer: str, target_layer: str) -> bool:
"""Check if the dependency is a violation."""
if source_layer == target_layer:
return False
valid_deps = self.VALID_DEPENDENCIES.get(source_layer, [])
return target_layer not in valid_deps and target_layer != source_layer
class ProjectArchitect:
"""Main class for project architect functionality"""
def __init__(self, target_path: str, verbose: bool = False):
self.target_path = Path(target_path)
"""Main class that orchestrates architecture analysis."""
def __init__(self, project_path: Path, verbose: bool = False):
self.project_path = project_path
self.verbose = verbose
self.results = {}
def run(self) -> Dict:
"""Execute the main functionality"""
print(f"🚀 Running {self.__class__.__name__}...")
print(f"📁 Target: {self.target_path}")
try:
self.validate_target()
self.analyze()
self.generate_report()
print("✅ Completed successfully!")
return self.results
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
def validate_target(self):
"""Validate the target path exists and is accessible"""
if not self.target_path.exists():
raise ValueError(f"Target path does not exist: {self.target_path}")
def analyze(self) -> Dict:
"""Run full architecture analysis."""
if self.verbose:
print(f"✓ Target validated: {self.target_path}")
def analyze(self):
"""Perform the main analysis or operation"""
print(f"Analyzing project: {self.project_path}")
# Pattern detection
pattern_detector = PatternDetector(self.project_path)
pattern_result = pattern_detector.scan()
if self.verbose:
print("📊 Analyzing...")
# Main logic here
self.results['status'] = 'success'
self.results['target'] = str(self.target_path)
self.results['findings'] = []
# Add analysis results
print(f"Detected pattern: {pattern_result['detected_pattern']} "
f"(confidence: {pattern_result['confidence']}%)")
# Code analysis
code_analyzer = CodeAnalyzer(self.project_path, self.verbose)
code_result = code_analyzer.analyze()
if self.verbose:
print(f"✓ Analysis complete: {len(self.results.get('findings', []))} findings")
def generate_report(self):
"""Generate and display the report"""
print("\n" + "="*50)
print("REPORT")
print("="*50)
print(f"Target: {self.results.get('target')}")
print(f"Status: {self.results.get('status')}")
print(f"Findings: {len(self.results.get('findings', []))}")
print("="*50 + "\n")
print(f"Found {len(code_result['issues'])} code issues")
# Layer violation detection
violation_detector = LayerViolationDetector(
self.project_path,
pattern_result['layer_assignments']
)
violations = violation_detector.detect()
if self.verbose:
print(f"Found {len(violations)} layer violations")
# Generate recommendations
recommendations = self._generate_recommendations(
pattern_result, code_result, violations
)
return {
'project_path': str(self.project_path),
'architecture': {
'detected_pattern': pattern_result['detected_pattern'],
'confidence': pattern_result['confidence'],
'layer_assignments': pattern_result['layer_assignments'],
'pattern_scores': pattern_result['pattern_scores'],
},
'structure': {
'directories': pattern_result['directories'],
},
'code_quality': {
'metrics': code_result['metrics'],
'issues': code_result['issues'],
},
'layer_violations': violations,
'recommendations': recommendations,
'summary': {
'pattern': pattern_result['detected_pattern'],
'confidence': pattern_result['confidence'],
'total_issues': len(code_result['issues']) + len(violations),
'code_issues': len(code_result['issues']),
'layer_violations': len(violations),
},
}
def _generate_recommendations(self, pattern_result: Dict, code_result: Dict,
violations: List[Dict]) -> List[str]:
"""Generate actionable recommendations."""
recommendations = []
# Pattern recommendations
pattern = pattern_result['detected_pattern']
confidence = pattern_result['confidence']
if pattern == 'unstructured' or confidence < 30:
recommendations.append(
"Consider adopting a clear architectural pattern (Layered, Clean, or Hexagonal) "
"to improve code organization and maintainability"
)
# Layer violation recommendations
if violations:
recommendations.append(
f"Fix {len(violations)} layer violation(s) to maintain proper separation of concerns. "
"Dependencies should flow from presentation → application → domain ← infrastructure"
)
# God class recommendations
god_classes = code_result['metrics'].get('god_classes', [])
if god_classes:
recommendations.append(
f"Split {len(god_classes)} large class(es) into smaller, focused classes "
"following the Single Responsibility Principle"
)
# Large file recommendations
large_files = code_result['metrics'].get('large_files', [])
if large_files:
recommendations.append(
f"Consider refactoring {len(large_files)} large file(s) into smaller modules"
)
# Missing layer recommendations
assigned_layers = set(pattern_result['layer_assignments'].values())
if pattern in ['layered', 'clean', 'hexagonal']:
expected_layers = {'presentation', 'application', 'domain', 'infrastructure'}
missing = expected_layers - assigned_layers - {'unknown'}
if missing:
recommendations.append(
f"Consider adding missing architectural layer(s): {', '.join(missing)}"
)
return recommendations
def print_human_report(report: Dict):
"""Print human-readable report."""
print("\n" + "=" * 60)
print("ARCHITECTURE ASSESSMENT")
print("=" * 60)
print(f"\nProject: {report['project_path']}")
arch = report['architecture']
print(f"\n--- Architecture Pattern ---")
print(f"Detected: {arch['detected_pattern'].replace('_', ' ').title()}")
print(f"Confidence: {arch['confidence']}%")
if arch['layer_assignments']:
print(f"\nLayer Assignments:")
for dir_name, layer in sorted(arch['layer_assignments'].items()):
if layer != 'unknown':
status = "OK"
else:
status = "?"
print(f" {status} {dir_name:20} -> {layer}")
summary = report['summary']
print(f"\n--- Summary ---")
print(f"Total issues: {summary['total_issues']}")
print(f" Code issues: {summary['code_issues']}")
print(f" Layer violations: {summary['layer_violations']}")
if report['code_quality']['issues']:
print(f"\n--- Code Issues ---")
for issue in report['code_quality']['issues'][:10]:
severity = issue['severity'].upper()
print(f" [{severity}] {issue.get('file', 'N/A')}")
print(f" {issue['message']}")
if 'suggestion' in issue:
print(f" Suggestion: {issue['suggestion']}")
if report['layer_violations']:
print(f"\n--- Layer Violations ---")
for v in report['layer_violations'][:5]:
print(f" {v['file']}")
print(f" {v['message']}")
if report['recommendations']:
print(f"\n--- Recommendations ---")
for i, rec in enumerate(report['recommendations'], 1):
print(f" {i}. {rec}")
metrics = report['code_quality']['metrics']
print(f"\n--- Metrics ---")
print(f" Total lines: {metrics.get('total_lines', 'N/A')}")
print(f" File count: {metrics.get('file_count', 'N/A')}")
print(f" Avg lines/file: {metrics.get('avg_file_lines', 'N/A')}")
print("\n" + "=" * 60)
def main():
"""Main entry point"""
parser = argparse.ArgumentParser(
description="Project Architect"
description='Analyze project architecture and detect patterns and issues',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog='''
Examples:
%(prog)s ./my-project
%(prog)s ./my-project --verbose
%(prog)s ./my-project --output json
%(prog)s ./my-project --check layers
Detects:
- Architectural patterns (Layered, MVC, Hexagonal, Clean, Microservices)
- Code organization issues (large files, god classes)
- Layer violations (incorrect dependencies between layers)
- Missing architectural components
'''
)
parser.add_argument(
'project_path',
help='Path to the project directory'
)
parser.add_argument(
'target',
help='Target path to analyze or process'
'--output', '-o',
choices=['human', 'json'],
default='human',
help='Output format (default: human)'
)
parser.add_argument(
'--check',
choices=['all', 'pattern', 'layers', 'code'],
default='all',
help='What to check (default: all)'
)
parser.add_argument(
'--verbose', '-v',
@@ -83,32 +685,65 @@ def main():
help='Enable verbose output'
)
parser.add_argument(
'--json',
action='store_true',
help='Output results as JSON'
'--save', '-s',
help='Save report to file'
)
parser.add_argument(
'--output', '-o',
help='Output file path'
)
args = parser.parse_args()
tool = ProjectArchitect(
args.target,
verbose=args.verbose
)
results = tool.run()
if args.json:
output = json.dumps(results, indent=2)
if args.output:
with open(args.output, 'w') as f:
f.write(output)
print(f"Results written to {args.output}")
project_path = Path(args.project_path).resolve()
if not project_path.exists():
print(f"Error: Project path does not exist: {project_path}", file=sys.stderr)
sys.exit(1)
if not project_path.is_dir():
print(f"Error: Project path is not a directory: {project_path}", file=sys.stderr)
sys.exit(1)
# Run analysis
architect = ProjectArchitect(project_path, verbose=args.verbose)
report = architect.analyze()
# Handle specific checks
if args.check == 'pattern':
arch = report['architecture']
print(f"Pattern: {arch['detected_pattern']} (confidence: {arch['confidence']}%)")
sys.exit(0)
elif args.check == 'layers':
violations = report['layer_violations']
if violations:
print(f"Found {len(violations)} layer violation(s):")
for v in violations:
print(f" {v['file']}: {v['message']}")
sys.exit(1)
else:
print("No layer violations found.")
sys.exit(0)
elif args.check == 'code':
issues = report['code_quality']['issues']
if issues:
print(f"Found {len(issues)} code issue(s):")
for issue in issues[:10]:
print(f" [{issue['severity'].upper()}] {issue['message']}")
sys.exit(1 if any(i['severity'] == 'warning' for i in issues) else 0)
else:
print("No code issues found.")
sys.exit(0)
# Output report
if args.output == 'json':
output = json.dumps(report, indent=2)
if args.save:
Path(args.save).write_text(output)
print(f"Report saved to {args.save}")
else:
print(output)
else:
print_human_report(report)
if args.save:
Path(args.save).write_text(json.dumps(report, indent=2))
print(f"\nJSON report saved to {args.save}")
if __name__ == '__main__':
main()