Files
antigravity-skills-reference/skills/last30days/plans/feat-add-websearch-source.md

14 KiB

feat: Add WebSearch as Third Source (Zero-Config Fallback)

Overview

Add Claude's built-in WebSearch tool as a third research source for /last30days. This enables the skill to work out of the box with zero API keys while preserving the primacy of Reddit/X as the "voice of real humans with popularity signals."

Key principle: WebSearch is supplementary, not primary. Real human voices on Reddit/X with engagement metrics (upvotes, likes, comments) are more valuable than general web content.

Problem Statement

Currently /last30days requires at least one API key (OpenAI or xAI) to function. Users without API keys get an error. Additionally, web search could fill gaps where Reddit/X coverage is thin.

User requirements:

  • Work out of the box (no API key needed)
  • Must NOT overpower Reddit/X results
  • Needs proper weighting
  • Validate with before/after testing

Proposed Solution

Weighting Strategy: "Engagement-Adjusted Scoring"

Current formula (same for Reddit/X):

score = 0.45*relevance + 0.25*recency + 0.30*engagement - penalties

Problem: WebSearch has NO engagement metrics. Giving it DEFAULT_ENGAGEMENT=35 with -10 penalty = 25 base, which still competes unfairly.

Solution: Source-specific scoring with engagement substitution:

Source Relevance Recency Engagement Source Penalty
Reddit 45% 25% 30% (real metrics) 0
X 45% 25% 30% (real metrics) 0
WebSearch 55% 35% 0% (no data) -15 points

Rationale:

  • WebSearch items compete on relevance + recency only (reweighted to 100%)
  • -15 point source penalty ensures WebSearch ranks below comparable Reddit/X items
  • High-quality WebSearch can still surface (score 60-70) but won't dominate (Reddit/X score 70-85)

Mode Behavior

API Keys Available Default Behavior --include-web
None WebSearch only n/a
OpenAI only Reddit only Reddit + WebSearch
xAI only X only X + WebSearch
Both Reddit + X Reddit + X + WebSearch

CLI flag: --include-web (default: false when other sources available)

Technical Approach

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     last30days.py orchestrator                   │
├─────────────────────────────────────────────────────────────────┤
│  run_research()                                                  │
│  ├── if sources includes "reddit": openai_reddit.search_reddit()│
│  ├── if sources includes "x": xai_x.search_x()                  │
│  └── if sources includes "web": websearch.search_web() ← NEW    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Processing Pipeline                          │
├─────────────────────────────────────────────────────────────────┤
│  normalize_websearch_items() → WebSearchItem schema ← NEW        │
│  score_websearch_items() → engagement-free scoring ← NEW         │
│  dedupe_websearch() → deduplication ← NEW                        │
│  render_websearch_section() → output formatting ← NEW            │
└─────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase 1: Schema & Core Infrastructure

Files to create/modify:

# scripts/lib/websearch.py (NEW)
"""Claude WebSearch API client for general web discovery."""

WEBSEARCH_PROMPT = """Search the web for content about: {topic}

CRITICAL: Only include results from the last 30 days (after {from_date}).

Find {min_items}-{max_items} high-quality, relevant web pages. Prefer:
- Blog posts, tutorials, documentation
- News articles, announcements
- Authoritative sources (official docs, reputable publications)

AVOID:
- Reddit (covered separately)
- X/Twitter (covered separately)
- YouTube without transcripts
- Forum threads without clear answers

Return ONLY valid JSON:
{{
  "items": [
    {{
      "title": "Page title",
      "url": "https://...",
      "source_domain": "example.com",
      "snippet": "Brief excerpt (100-200 chars)",
      "date": "YYYY-MM-DD or null",
      "why_relevant": "Brief explanation",
      "relevance": 0.85
    }}
  ]
}}
"""

def search_web(topic: str, from_date: str, to_date: str, depth: str = "default") -> dict:
    """Search web using Claude's built-in WebSearch tool.

    NOTE: This runs INSIDE Claude Code, so we use the WebSearch tool directly.
    No API key needed - uses Claude's session.
    """
    # Implementation uses Claude's web_search_20250305 tool
    pass

def parse_websearch_response(response: dict) -> list[dict]:
    """Parse WebSearch results into normalized format."""
    pass
# scripts/lib/schema.py - ADD WebSearchItem

@dataclass
class WebSearchItem:
    """Normalized web search item."""
    id: str
    title: str
    url: str
    source_domain: str  # e.g., "medium.com", "github.com"
    snippet: str
    date: Optional[str] = None
    date_confidence: str = "low"
    relevance: float = 0.5
    why_relevant: str = ""
    subs: SubScores = field(default_factory=SubScores)
    score: int = 0

    def to_dict(self) -> Dict[str, Any]:
        return {
            'id': self.id,
            'title': self.title,
            'url': self.url,
            'source_domain': self.source_domain,
            'snippet': self.snippet,
            'date': self.date,
            'date_confidence': self.date_confidence,
            'relevance': self.relevance,
            'why_relevant': self.why_relevant,
            'subs': self.subs.to_dict(),
            'score': self.score,
        }

Phase 2: Scoring System Updates

# scripts/lib/score.py - ADD websearch scoring

# New constants
WEBSEARCH_SOURCE_PENALTY = 15  # Points deducted for lacking engagement

# Reweighted for no engagement
WEBSEARCH_WEIGHT_RELEVANCE = 0.55
WEBSEARCH_WEIGHT_RECENCY = 0.45

def score_websearch_items(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]:
    """Score WebSearch items WITHOUT engagement metrics.

    Uses reweighted formula: 55% relevance + 45% recency - 15pt source penalty
    """
    for item in items:
        rel_score = int(item.relevance * 100)
        rec_score = dates.recency_score(item.date)

        item.subs = schema.SubScores(
            relevance=rel_score,
            recency=rec_score,
            engagement=0,  # Explicitly zero - no engagement data
        )

        overall = (
            WEBSEARCH_WEIGHT_RELEVANCE * rel_score +
            WEBSEARCH_WEIGHT_RECENCY * rec_score
        )

        # Apply source penalty (WebSearch < Reddit/X)
        overall -= WEBSEARCH_SOURCE_PENALTY

        # Apply date confidence penalty (same as other sources)
        if item.date_confidence == "low":
            overall -= 10
        elif item.date_confidence == "med":
            overall -= 5

        item.score = max(0, min(100, int(overall)))

    return items

Phase 3: Orchestrator Integration

# scripts/last30days.py - UPDATE run_research()

def run_research(...) -> tuple:
    """Run the research pipeline.

    Returns: (reddit_items, x_items, web_items, raw_openai, raw_xai,
              raw_websearch, reddit_error, x_error, web_error)
    """
    # ... existing Reddit/X code ...

    # WebSearch (new)
    web_items = []
    raw_websearch = None
    web_error = None

    if sources in ("all", "web", "reddit-web", "x-web"):
        if progress:
            progress.start_web()

        try:
            raw_websearch = websearch.search_web(topic, from_date, to_date, depth)
            web_items = websearch.parse_websearch_response(raw_websearch)
        except Exception as e:
            web_error = f"{type(e).__name__}: {e}"

        if progress:
            progress.end_web(len(web_items))

    return (reddit_items, x_items, web_items, raw_openai, raw_xai,
            raw_websearch, reddit_error, x_error, web_error)

Phase 4: CLI & Environment Updates

# scripts/last30days.py - ADD CLI flag

parser.add_argument(
    "--include-web",
    action="store_true",
    help="Include general web search alongside Reddit/X (lower weighted)",
)

# scripts/lib/env.py - UPDATE get_available_sources()

def get_available_sources(config: dict) -> str:
    """Determine available sources. WebSearch always available (no API key)."""
    has_openai = bool(config.get('OPENAI_API_KEY'))
    has_xai = bool(config.get('XAI_API_KEY'))

    if has_openai and has_xai:
        return 'both'  # WebSearch available but not default
    elif has_openai:
        return 'reddit'
    elif has_xai:
        return 'x'
    else:
        return 'web'  # Fallback: WebSearch only (no keys needed)

Acceptance Criteria

Functional Requirements

  • Skill works with zero API keys (WebSearch-only mode)
  • --include-web flag adds WebSearch to Reddit/X searches
  • WebSearch items have lower average scores than Reddit/X items with similar relevance
  • WebSearch results exclude Reddit/X URLs (handled separately)
  • Date filtering uses natural language ("last 30 days") in prompt
  • Output clearly labels source type: [WEB], [Reddit], [X]

Non-Functional Requirements

  • WebSearch adds <10s latency to total research time (0s - deferred to Claude)
  • Graceful degradation if WebSearch fails
  • Cache includes WebSearch results appropriately

Quality Gates

  • Before/after testing shows WebSearch doesn't dominate rankings (via -15pt penalty)
  • Test: 10 Reddit + 10 X + 10 WebSearch → WebSearch avg score 15-20pts lower (scoring formula verified)
  • Test: WebSearch-only mode produces useful results for common topics

Testing Plan

Before/After Comparison Script

# tests/test_websearch_weighting.py

"""
Test harness to validate WebSearch doesn't overpower Reddit/X.

Run same queries with:
1. Reddit + X only (baseline)
2. Reddit + X + WebSearch (comparison)

Verify: WebSearch items rank lower on average.
"""

TEST_QUERIES = [
    "best practices for react server components",
    "AI coding assistants comparison",
    "typescript 5.5 new features",
]

def test_websearch_weighting():
    for query in TEST_QUERIES:
        # Run without WebSearch
        baseline = run_research(query, sources="both")
        baseline_scores = [item.score for item in baseline.reddit + baseline.x]

        # Run with WebSearch
        with_web = run_research(query, sources="both", include_web=True)
        web_scores = [item.score for item in with_web.web]
        reddit_x_scores = [item.score for item in with_web.reddit + with_web.x]

        # Assertions
        avg_reddit_x = sum(reddit_x_scores) / len(reddit_x_scores)
        avg_web = sum(web_scores) / len(web_scores) if web_scores else 0

        assert avg_web < avg_reddit_x - 10, \
            f"WebSearch avg ({avg_web}) too close to Reddit/X avg ({avg_reddit_x})"

        # Check top 5 aren't all WebSearch
        top_5 = sorted(with_web.reddit + with_web.x + with_web.web,
                       key=lambda x: -x.score)[:5]
        web_in_top_5 = sum(1 for item in top_5 if isinstance(item, WebSearchItem))
        assert web_in_top_5 <= 2, f"Too many WebSearch items in top 5: {web_in_top_5}"

Manual Test Scenarios

Scenario Expected Outcome
No API keys, run /last30days AI tools WebSearch-only results, useful output
Both keys + --include-web, run /last30days react Mix of all 3 sources, Reddit/X dominate top 10
Niche topic (no Reddit/X coverage) WebSearch fills gap, becomes primary
Popular topic (lots of Reddit/X) WebSearch present but lower-ranked

Dependencies & Prerequisites

  • Claude Code's WebSearch tool (web_search_20250305) - already available
  • No new API keys required
  • Existing test infrastructure in tests/

Risk Analysis & Mitigation

Risk Likelihood Impact Mitigation
WebSearch returns stale content Medium Medium Enforce date in prompt, apply low-confidence penalty
WebSearch dominates rankings Low High Source penalty (-15pts), testing validates
WebSearch adds spam/low-quality Medium Medium Exclude social media domains, domain filtering
Date parsing unreliable High Medium Accept "low" confidence as normal for WebSearch

Future Considerations

  1. Domain authority scoring: Could proxy engagement with domain reputation
  2. User-configurable weights: Let users adjust WebSearch penalty
  3. Domain whitelist/blacklist: Filter WebSearch to trusted sources
  4. Parallel execution: Run all 3 sources concurrently for speed

References

Internal References

  • Scoring algorithm: scripts/lib/score.py:8-15
  • Source detection: scripts/lib/env.py:57-72
  • Schema patterns: scripts/lib/schema.py:76-138
  • Orchestrator: scripts/last30days.py:54-164

External References

Research Findings

  • Reddit upvotes are ~12% of ranking value in SEO (strong signal)
  • E-E-A-T framework: Engagement metrics = trust signal
  • MSA2C2 approach: Dynamic weight learning for multi-source aggregation