fix(engineering): improve codebase-onboarding - add scripts + extract references

2026-03-11 20:21:34 +01:00
parent 60ad9d3873
commit abab3b528e
3 changed files with 427 additions and 417 deletions
--- a/engineering/codebase-onboarding/SKILL.md
+++ b/engineering/codebase-onboarding/SKILL.md
@@ -13,453 +13,72 @@ description: "Codebase Onboarding"

 ## Overview

-Analyze a codebase and generate comprehensive onboarding documentation tailored to your audience. Produces architecture overviews, key file maps, local setup guides, common task runbooks, debugging guides, and contribution guidelines. Outputs to Markdown, Notion, or Confluence.
+Analyze a codebase and generate onboarding documentation for engineers, tech leads, and contractors. This skill is optimized for fast fact-gathering and repeatable onboarding outputs.

 ## Core Capabilities

- **Architecture overview** — tech stack, system boundaries, data flow diagrams
- **Key file map** — what's important and why, with annotations
- **Local setup guide** — step-by-step from clone to running tests
- **Common developer tasks** — how to add a route, run migrations, create a component
- **Debugging guide** — common errors, log locations, useful queries
- **Contribution guidelines** — branch strategy, PR process, code style
- **Audience-aware output** — junior, senior, or contractor mode
+- Architecture and stack discovery from repository signals
+- Key file and config inventory for new contributors
+- Local setup and common-task guidance generation
+- Audience-aware documentation framing
+- Debugging and contribution checklist scaffolding

 ---

 ## When to Use

 - Onboarding a new team member or contractor
- After a major refactor that made existing docs stale
- Before open-sourcing a project
- Creating a team wiki page for a service
- Self-documenting before a long vacation
-
---
-
-## Codebase Analysis Commands
-
-Run these before generating docs to gather facts:
-
-```bash
-# Project overview
-cat package.json | jq '{name, version, scripts, dependencies: (.dependencies | keys), devDependencies: (.devDependencies | keys)}'
-
-# Directory structure (top 2 levels)
-find . -maxdepth 2 -not -path '*/node_modules/*' -not -path '*/.git/*' -not -path '*/.next/*' | sort | head -60
-
-# Largest files (often core modules)
-find src/ -name "*.ts" -not -path "*/test*" -exec wc -l {} + | sort -rn | head -20
-
-# All routes (Next.js App Router)
-find app/ -name "route.ts" -o -name "page.tsx" | sort
-
-# All routes (Express)
-grep -rn "router\.\(get\|post\|put\|patch\|delete\)" src/routes/ --include="*.ts"
-
-# Recent major changes
-git log --oneline --since="90 days ago" | grep -E "feat|refactor|breaking"
-
-# Top contributors
-git shortlog -sn --no-merges | head -10
-
-# Test coverage summary
-pnpm test:ci --coverage 2>&1 | tail -20
-```
-
---
-
-## Generated Documentation Template
-
-### README.md — Full Template
-
-```markdown
-# [Project Name]
-
-> One-sentence description of what this does and who uses it.
-
-[![CI](https://github.com/org/repo/actions/workflows/ci.yml/badge.svg)](https://github.com/org/repo/actions/workflows/ci.yml)
-[![Coverage](https://codecov.io/gh/org/repo/branch/main/graph/badge.svg)](https://codecov.io/gh/org/repo)
-
-## What is this?
-
-[2-3 sentences: problem it solves, who uses it, current state]
-
-**Live:** https://myapp.com  
-**Staging:** https://staging.myapp.com  
-**Docs:** https://docs.myapp.com
+- Rebuilding stale project docs after large refactors
+- Preparing internal handoff documentation
+- Creating a standardized onboarding packet for services

 ---

 ## Quick Start

-### Prerequisites
-
-| Tool | Version | Install |
-|------|---------|---------|
-| Node.js | 20+ | `nvm install 20` |
-| pnpm | 8+ | `npm i -g pnpm` |
-| Docker | 24+ | [docker.com](https://docker.com) |
-| PostgreSQL | 16+ | via Docker (see below) |
-
-### Setup (5 minutes)
-
 ```bash
-# 1. Clone
-git clone https://github.com/org/repo
-cd repo
+# 1) Gather codebase facts
+python3 scripts/codebase_analyzer.py /path/to/repo

-# 2. Install dependencies
-pnpm install
+# 2) Export machine-readable output
+python3 scripts/codebase_analyzer.py /path/to/repo --json

-# 3. Start infrastructure
-docker compose up -d   # Starts Postgres, Redis
-
-# 4. Environment
-cp .env.example .env
-# Edit .env — ask a teammate for real values or see Vault
-
-# 5. Database setup
-pnpm db:migrate        # Run migrations
-pnpm db:seed           # Optional: load test data
-
-# 6. Start dev server
-pnpm dev               # → http://localhost:3000
-
-# 7. Verify
-pnpm test              # Should be all green
-```
-
-### Verify it works
-
- [ ] `http://localhost:3000` loads the app
- [ ] `http://localhost:3000/api/health` returns `{"status":"ok"}`
- [ ] `pnpm test` passes
-
---
-
-## Architecture
-
-### System Overview
-
-```
-Browser / Mobile
-    │
-    ▼
-[Next.js App] ←──── [Auth: NextAuth]
-    │
-    ├──→ [PostgreSQL] (primary data store)
-    ├──→ [Redis] (sessions, job queue)
-    └──→ [S3] (file uploads)
-         
-Background:
-[BullMQ workers] ←── Redis queue
-    └──→ [External APIs: Stripe, SendGrid]
-```
-
-### Tech Stack
-
-| Layer | Technology | Why |
-|-------|-----------|-----|
-| Frontend | Next.js 14 (App Router) | SSR, file-based routing |
-| Styling | Tailwind CSS + shadcn/ui | Rapid UI development |
-| API | Next.js Route Handlers | Co-located with frontend |
-| Database | PostgreSQL 16 | Relational, RLS for multi-tenancy |
-| ORM | Drizzle ORM | Type-safe, lightweight |
-| Auth | NextAuth v5 | OAuth + email/password |
-| Queue | BullMQ + Redis | Background jobs |
-| Storage | AWS S3 | File uploads |
-| Email | SendGrid | Transactional email |
-| Payments | Stripe | Subscriptions |
-| Deployment | Vercel (app) + Railway (workers) | |
-| Monitoring | Sentry + Datadog | |
-
---
-
-## Key Files
-
-| Path | Purpose |
-|------|---------|
-| `app/` | Next.js App Router — pages and API routes |
-| `app/api/` | API route handlers |
-| `app/(auth)/` | Auth pages (login, register, reset) |
-| `app/(app)/` | Protected app pages |
-| `src/db/` | Database schema, migrations, client |
-| `src/db/schema.ts` | **Drizzle schema — single source of truth** |
-| `src/lib/` | Shared utilities (auth, email, stripe) |
-| `src/lib/auth.ts` | **Auth configuration — read this first** |
-| `src/components/` | Reusable React components |
-| `src/hooks/` | Custom React hooks |
-| `src/types/` | Shared TypeScript types |
-| `workers/` | BullMQ background job processors |
-| `emails/` | React Email templates |
-| `tests/` | Test helpers, factories, integration tests |
-| `.env.example` | All env vars with descriptions |
-| `docker-compose.yml` | Local infrastructure |
-
---
-
-## Common Developer Tasks
-
-### Add a new API endpoint
-
-```bash
-# 1. Create route handler
-touch app/api/my-resource/route.ts
-```
-
-```typescript
-// app/api/my-resource/route.ts
-import { NextRequest, NextResponse } from 'next/server'
-import { auth } from '@/lib/auth'
-import { db } from '@/db/client'
-
-export async function GET(req: NextRequest) {
-  const session = await auth()
-  if (!session) {
-    return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
-  }
-  
-  const data = await db.query.myResource.findMany({
-    where: (r, { eq }) => eq(r.userId, session.user.id),
-  })
-  
-  return NextResponse.json({ data })
-}
-```
-
-```bash
-# 2. Add tests
-touch tests/api/my-resource.test.ts
-
-# 3. Add to OpenAPI spec (if applicable)
-pnpm generate:openapi
-```
-
-### Run a database migration
-
-```bash
-# Create migration
-pnpm db:generate     # Generates SQL from schema changes
-
-# Review the generated SQL
-cat drizzle/migrations/0001_my_change.sql
-
-# Apply
-pnpm db:migrate
-
-# Roll back (manual — inspect generated SQL and revert)
-psql $DATABASE_URL -f scripts/rollback_0001.sql
-```
-
-### Add a new email template
-
-```bash
-# 1. Create template
-touch emails/my-email.tsx
-
-# 2. Preview in browser
-pnpm email:preview
-
-# 3. Send in code
-import { sendEmail } from '@/lib/email'
-await sendEmail({
-  to: user.email,
-  subject: 'Subject line',
-  template: 'my-email',
-  props: { name: "username"
-})
-```
-
-### Add a background job
-
-```typescript
-// 1. Define job in workers/jobs/my-job.ts
-import { Queue, Worker } from 'bullmq'
-import { redis } from '@/lib/redis'
-
-export const myJobQueue = new Queue('my-job', { connection: redis })
-
-export const myJobWorker = new Worker('my-job', async (job) => {
-  const { userId, data } = job.data
-  // do work
-}, { connection: redis })
-
-// 2. Enqueue
-await myJobQueue.add('process', { userId, data }, {
-  attempts: 3,
-  backoff: { type: 'exponential', delay: 1000 },
-})
+# 3) Use the template to draft onboarding docs
+# See references/onboarding-template.md
 ```

 ---

-## Debugging Guide
+## Recommended Workflow

-### Common Errors
-
-**`Error: DATABASE_URL is not set`**
-```bash
-# Check your .env file exists and has the var
-cat .env | grep DATABASE_URL
-
-# Start Postgres if not running
-docker compose up -d postgres
-```
-
-**`PrismaClientKnownRequestError: P2002 Unique constraint failed`**
-```
-User already exists with that email. Check: is this a duplicate registration?
-Run: SELECT * FROM users WHERE email = 'test@example.com';
-```
-
-**`Error: JWT expired`**
-```bash
-# Dev: extend token TTL in .env
-JWT_EXPIRES_IN=30d
-
-# Check clock skew between server and client
-date && docker exec postgres date
-```
-
-**`500 on /api/*` in local dev**
-```bash
-# 1. Check terminal for stack trace
-# 2. Check database connectivity
-psql $DATABASE_URL -c "SELECT 1"
-# 3. Check Redis
-redis-cli ping
-# 4. Check logs
-pnpm dev 2>&1 | grep -E "error|Error|ERROR"
-```
-
-### Useful SQL Queries
-
-```sql
-- Find slow queries (requires pg_stat_statements)
-SELECT query, mean_exec_time, calls, total_exec_time
-FROM pg_stat_statements
-ORDER BY mean_exec_time DESC
-LIMIT 20;
-
-- Check active connections
-SELECT count(*), state FROM pg_stat_activity GROUP BY state;
-
-- Find bloated tables
-SELECT relname, n_dead_tup, n_live_tup,
-  round(n_dead_tup::numeric/nullif(n_live_tup,0)*100, 2) AS dead_pct
-FROM pg_stat_user_tables
-ORDER BY n_dead_tup DESC;
-```
-
-### Debug Authentication
-
-```bash
-# Decode a JWT (no secret needed for header/payload)
-echo "YOUR_JWT" | cut -d. -f2 | base64 -d | jq .
-
-# Check session in DB
-psql $DATABASE_URL -c "SELECT * FROM sessions WHERE user_id = 'usr_...' ORDER BY expires_at DESC LIMIT 5;"
-```
-
-### Log Locations
-
-| Environment | Logs |
-|-------------|------|
-| Local dev | Terminal running `pnpm dev` |
-| Vercel production | Vercel dashboard → Logs |
-| Workers (Railway) | Railway dashboard → Deployments → Logs |
-| Database | `docker logs postgres` (local) |
-| Background jobs | `pnpm worker:dev` terminal |
+1. Run `scripts/codebase_analyzer.py` against the target repository.
+2. Capture key signals: file counts, detected languages, config files, top-level structure.
+3. Fill the onboarding template in `references/onboarding-template.md`.
+4. Tailor output depth by audience:
+   - Junior: setup + guardrails
+   - Senior: architecture + operational concerns
+   - Contractor: scoped ownership + integration boundaries

 ---

-## Contribution Guidelines
+## Onboarding Document Template

-### Branch Strategy
-
-```
-main           → production (protected, requires PR + CI)
-  └── feature/PROJ-123-short-desc
-  └── fix/PROJ-456-bug-description
-  └── chore/update-dependencies
-```
-
-### PR Requirements
-
- [ ] Branch name includes ticket ID (e.g., `feature/PROJ-123-...`)
- [ ] PR description explains the why
- [ ] All CI checks pass
- [ ] Test coverage doesn't decrease
- [ ] Self-reviewed (read your own diff before requesting review)
- [ ] Screenshots/video for UI changes
-
-### Commit Convention
-
-```
-feat(scope): short description       → new feature
-fix(scope): short description        → bug fix
-chore: update dependencies           → maintenance
-docs: update API reference           → documentation
-```
-
-### Code Style
-
-```bash
-# Lint + format
-pnpm lint
-pnpm format
-
-# Type check
-pnpm typecheck
-
-# All checks (run before pushing)
-pnpm validate
-```
+Detailed template and section examples live in:
+- `references/onboarding-template.md`
+- `references/output-format-templates.md`

 ---

-## Audience-Specific Notes
-
-### For Junior Developers
- Start with `src/lib/auth.ts` to understand authentication
- Read existing tests in `tests/api/` — they document expected behavior
- Ask before touching anything in `src/db/schema.ts` — schema changes affect everyone
- Use `pnpm db:seed` to get realistic local data
-
-### For Senior Engineers / Tech Leads
- Architecture decisions are documented in `docs/adr/` (Architecture Decision Records)
- Performance benchmarks: `pnpm bench` — baseline is in `tests/benchmarks/baseline.json`
- Security model: RLS policies in `src/db/rls.sql`, enforced at DB level
- Scaling notes: `docs/scaling.md`
-
-### For Contractors
- Scope is limited to `src/features/[your-feature]/` unless discussed
- Never push directly to `main`
- All external API calls go through `src/lib/` wrappers (for mocking in tests)
- Time estimates: log in Linear ticket comments daily
-
---
-
-## Output Formats
-→ See references/output-format-templates.md for details
-
 ## Common Pitfalls

- **Docs written once, never updated** — add doc updates to PR checklist
- **Missing local setup step** — test setup instructions on a fresh machine quarterly
- **No error troubleshooting** — debugging section is the most valuable part for new hires
- **Too much detail for contractors** — they need task-specific, not architecture-deep docs
- **No screenshots** — UI flows need screenshots; they go stale but are still valuable
- **Skipping the "why"** — document why decisions were made, not just what was decided
-
---
+- Writing docs without validating setup commands on a clean environment
+- Mixing architecture deep-dives into contractor-oriented docs
+- Omitting troubleshooting and verification steps
+- Letting onboarding docs drift from current repo state

 ## Best Practices

-1. **Keep setup under 10 minutes** — if it takes longer, fix the setup, not the docs
-2. **Test the docs** — have a new hire follow them literally, fix every gap they hit
-3. **Link, don't repeat** — link to ADRs, issues, and external docs instead of duplicating
-4. **Update in the same PR** — docs changes alongside code changes
-5. **Version-specific notes** — call out things that changed in recent versions
-6. **Runbooks over theory** — "run this command" beats "the system uses Redis for..."
+1. Keep setup instructions executable and time-bounded.
+2. Document the "why" for key architectural decisions.
+3. Update docs in the same PR as behavior changes.
+4. Treat onboarding docs as living operational assets, not one-time deliverables.
--- a/engineering/codebase-onboarding/references/onboarding-template.md
+++ b/engineering/codebase-onboarding/references/onboarding-template.md
@@ -0,0 +1,186 @@
+# Onboarding Document Template
+
+## README.md - Full Template
+
+```markdown
+# [Project Name]
+
+> One-sentence description of what this does and who uses it.
+
+[![CI](https://github.com/org/repo/actions/workflows/ci.yml/badge.svg)](https://github.com/org/repo/actions/workflows/ci.yml)
+[![Coverage](https://codecov.io/gh/org/repo/branch/main/graph/badge.svg)](https://codecov.io/gh/org/repo)
+
+## What is this?
+
+[2-3 sentences: problem it solves, who uses it, current state]
+
+**Live:** https://myapp.com  
+**Staging:** https://staging.myapp.com  
+**Docs:** https://docs.myapp.com
+
+---
+
+## Quick Start
+
+### Prerequisites
+
+| Tool | Version | Install |
+|------|---------|---------|
+| Node.js | 20+ | `nvm install 20` |
+| pnpm | 8+ | `npm i -g pnpm` |
+| Docker | 24+ | [docker.com](https://docker.com) |
+| PostgreSQL | 16+ | via Docker (see below) |
+
+### Setup (5 minutes)
+
+```bash
+git clone https://github.com/org/repo
+cd repo
+pnpm install
+docker compose up -d
+cp .env.example .env
+pnpm db:migrate
+pnpm db:seed
+pnpm dev
+pnpm test
+```
+
+### Verify it works
+
+- [ ] App loads on localhost
+- [ ] Health endpoint returns ok
+- [ ] Tests pass
+
+---
+
+## Architecture
+
+### System Overview
+
+```
+Browser / Mobile
+    |
+    v
+[Next.js App] <- [Auth]
+    |
+    +-> [PostgreSQL]
+    +-> [Redis]
+    +-> [S3]
+```
+
+### Tech Stack
+
+| Layer | Technology | Why |
+|-------|-----------|-----|
+| Frontend | Next.js | SSR + routing |
+| Styling | Tailwind + shadcn/ui | Rapid UI |
+| API | Route handlers | Co-location |
+| Database | PostgreSQL | Relational |
+| Queue | BullMQ + Redis | Background jobs |
+
+---
+
+## Key Files
+
+| Path | Purpose |
+|------|---------|
+| `app/` | Pages and route handlers |
+| `src/db/` | Schema and migrations |
+| `src/lib/` | Shared utilities |
+| `tests/` | Test suites and helpers |
+| `.env.example` | Required variables |
+
+---
+
+## Common Developer Tasks
+
+### Add a new API endpoint
+
+```bash
+touch app/api/my-resource/route.ts
+touch tests/api/my-resource.test.ts
+```
+
+### Run a database migration
+
+```bash
+pnpm db:generate
+pnpm db:migrate
+```
+
+### Add a background job
+
+```bash
+# Create worker module and enqueue path
+```
+
+---
+
+## Debugging Guide
+
+### Common Errors
+
+- Missing environment variable
+- Database connectivity failure
+- Expired auth token
+- Generic 500 in local dev
+
+### Useful SQL Queries
+
+- Slow query checks
+- Connection status
+- Table bloat checks
+
+### Log Locations
+
+| Environment | Logs |
+|-------------|------|
+| Local dev | local terminal |
+| Production | platform logs |
+| Worker | worker process logs |
+
+---
+
+## Contribution Guidelines
+
+### Branch Strategy
+
+- `main` protected
+- feature/fix branches with ticket IDs
+
+### PR Requirements
+
+- CI green
+- Tests updated
+- Why documented
+- Self-review completed
+
+### Commit Convention
+
+- `feat(scope): ...`
+- `fix(scope): ...`
+- `docs: ...`
+
+---
+
+## Audience-Specific Notes
+
+### Junior Developers
+- Start with core auth/data modules
+- Follow tests as executable examples
+
+### Senior Engineers
+- Read ADRs and scaling notes first
+- Validate performance/security assumptions early
+
+### Contractors
+- Stay within scoped feature boundaries
+- Use wrappers for external integrations
+```
+
+## Usage Notes
+
+- Keep onboarding setup under 10 minutes where possible.
+- Include executable verification checks after each setup phase.
+- Prefer links to canonical docs instead of duplicating long content.
+- Update this template when stack conventions or tooling change.
--- a/engineering/codebase-onboarding/scripts/codebase_analyzer.py
+++ b/engineering/codebase-onboarding/scripts/codebase_analyzer.py
@@ -0,0 +1,205 @@
+#!/usr/bin/env python3
+"""Generate a compact onboarding summary for a codebase (stdlib only)."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+from collections import Counter
+from pathlib import Path
+from typing import Dict, Iterable, List
+
+IGNORED_DIRS = {
+    ".git",
+    "node_modules",
+    ".next",
+    "dist",
+    "build",
+    "coverage",
+    "venv",
+    ".venv",
+    "__pycache__",
+}
+
+EXT_TO_LANG = {
+    ".py": "Python",
+    ".ts": "TypeScript",
+    ".tsx": "TypeScript",
+    ".js": "JavaScript",
+    ".jsx": "JavaScript",
+    ".go": "Go",
+    ".rs": "Rust",
+    ".java": "Java",
+    ".kt": "Kotlin",
+    ".rb": "Ruby",
+    ".php": "PHP",
+    ".cs": "C#",
+    ".c": "C",
+    ".cpp": "C++",
+    ".h": "C/C++",
+    ".swift": "Swift",
+    ".sql": "SQL",
+    ".sh": "Shell",
+}
+
+KEY_CONFIG_FILES = [
+    "package.json",
+    "pnpm-workspace.yaml",
+    "turbo.json",
+    "nx.json",
+    "lerna.json",
+    "tsconfig.json",
+    "next.config.js",
+    "next.config.mjs",
+    "pyproject.toml",
+    "requirements.txt",
+    "go.mod",
+    "Cargo.toml",
+    "docker-compose.yml",
+    "Dockerfile",
+    ".github/workflows",
+]
+
+
+def iter_files(root: Path) -> Iterable[Path]:
+    for dirpath, dirnames, filenames in os.walk(root):
+        dirnames[:] = [d for d in dirnames if d not in IGNORED_DIRS]
+        for name in filenames:
+            path = Path(dirpath) / name
+            if path.is_file():
+                yield path
+
+
+def detect_languages(paths: Iterable[Path]) -> Dict[str, int]:
+    counts: Counter[str] = Counter()
+    for path in paths:
+        lang = EXT_TO_LANG.get(path.suffix.lower())
+        if lang:
+            counts[lang] += 1
+    return dict(sorted(counts.items(), key=lambda item: (-item[1], item[0])))
+
+
+def find_key_configs(root: Path) -> List[str]:
+    found: List[str] = []
+    for rel in KEY_CONFIG_FILES:
+        if (root / rel).exists():
+            found.append(rel)
+    return found
+
+
+def top_level_structure(root: Path, max_depth: int) -> List[str]:
+    lines: List[str] = []
+    for dirpath, dirnames, filenames in os.walk(root):
+        rel = Path(dirpath).relative_to(root)
+        depth = 0 if str(rel) == "." else len(rel.parts)
+        if depth > max_depth:
+            dirnames[:] = []
+            continue
+
+        if any(part in IGNORED_DIRS for part in rel.parts):
+            dirnames[:] = []
+            continue
+
+        indent = "  " * depth
+        if str(rel) != ".":
+            lines.append(f"{indent}{rel.name}/")
+
+        visible_files = [f for f in sorted(filenames) if not f.startswith(".")]
+        for filename in visible_files[:10]:
+            lines.append(f"{indent}  {filename}")
+
+        dirnames[:] = sorted([d for d in dirnames if d not in IGNORED_DIRS])
+    return lines
+
+
+def build_report(root: Path, max_depth: int) -> Dict[str, object]:
+    files = list(iter_files(root))
+    languages = detect_languages(files)
+    total_files = len(files)
+    file_count_by_ext: Counter[str] = Counter(p.suffix.lower() or "<no-ext>" for p in files)
+
+    largest = sorted(
+        ((str(p.relative_to(root)), p.stat().st_size) for p in files),
+        key=lambda item: item[1],
+        reverse=True,
+    )[:20]
+
+    return {
+        "root": str(root),
+        "file_count": total_files,
+        "languages": languages,
+        "key_config_files": find_key_configs(root),
+        "top_extensions": dict(file_count_by_ext.most_common(12)),
+        "largest_files": largest,
+        "directory_structure": top_level_structure(root, max_depth),
+    }
+
+
+def format_size(num_bytes: int) -> str:
+    units = ["B", "KB", "MB", "GB"]
+    value = float(num_bytes)
+    for unit in units:
+        if value < 1024 or unit == units[-1]:
+            return f"{value:.1f}{unit}"
+        value /= 1024
+    return f"{num_bytes}B"
+
+
+def print_text(report: Dict[str, object]) -> None:
+    print("Codebase Onboarding Summary")
+    print(f"Root: {report['root']}")
+    print(f"Total files: {report['file_count']}")
+    print("")
+
+    print("Languages detected")
+    if report["languages"]:
+        for lang, count in report["languages"].items():
+            print(f"- {lang}: {count}")
+    else:
+        print("- No recognized source file extensions")
+    print("")
+
+    print("Key config files")
+    configs = report["key_config_files"]
+    if configs:
+        for cfg in configs:
+            print(f"- {cfg}")
+    else:
+        print("- None found from default checklist")
+    print("")
+
+    print("Largest files")
+    for rel, size in report["largest_files"][:10]:
+        print(f"- {rel}: {format_size(size)}")
+    print("")
+
+    print("Directory structure")
+    for line in report["directory_structure"][:200]:
+        print(line)
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Scan a repository and generate onboarding summary facts.")
+    parser.add_argument("path", help="Path to project directory")
+    parser.add_argument("--max-depth", type=int, default=2, help="Max depth for structure output (default: 2)")
+    parser.add_argument("--json", action="store_true", help="Print JSON output")
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+    root = Path(args.path).expanduser().resolve()
+    if not root.exists() or not root.is_dir():
+        raise SystemExit(f"Path is not a directory: {root}")
+
+    report = build_report(root, max_depth=max(1, args.max_depth))
+    if args.json:
+        print(json.dumps(report, indent=2))
+    else:
+        print_text(report)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())