firefrost-gaming/antigravity-skills-reference

Files

ProgramadorBrasil 61ec71c5c7 feat: add 52 specialized AI agent skills (#217 )

New skills covering 10 categories:

**Security & Audit**: 007 (STRIDE/PASTA/OWASP), cred-omega (secrets management)
**AI Personas**: Karpathy, Hinton, Sutskever, LeCun (4 sub-skills), Altman, Musk, Gates, Jobs, Buffett
**Multi-agent Orchestration**: agent-orchestrator, task-intelligence, multi-advisor
**Code Analysis**: matematico-tao (Terence Tao-inspired mathematical code analysis)
**Social & Messaging**: Instagram Graph API, Telegram Bot, WhatsApp Cloud API, social-orchestrator
**Image Generation**: AI Studio (Gemini), Stability AI, ComfyUI Gateway, image-studio router
**Brazilian Domain**: 6 auction specialist modules, 2 legal advisors, auctioneers data scraper
**Product & Growth**: design, invention, monetization, analytics, growth engine
**DevOps & LLM Ops**: Docker/CI-CD/AWS, RAG/embeddings/fine-tuning
**Skill Governance**: installer, sentinel auditor, context management

Each skill includes:
- Standardized YAML frontmatter (name, description, risk, source, tags, tools)
- Structured sections (Overview, When to Use, How it Works, Best Practices)
- Python scripts and reference documentation where applicable
- Cross-platform compatibility (Claude Code, Antigravity, Cursor, Gemini CLI, Codex CLI)

Co-authored-by: ProgramadorBrasil <214873561+ProgramadorBrasil@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 10:04:07 +01:00

11 KiB

Raw Blame History

Data Transforms Reference

Patterns for cleaning, normalizing, deduplicating, and enriching extracted web data. Apply these transforms in Phase 5 (Transform) between extraction and validation.

Automatic Transforms

Always apply these to every extraction result.

Whitespace Cleanup

# Remove leading/trailing whitespace, collapse internal whitespace
value = ' '.join(value.split())

# Remove zero-width characters
import re
value = re.sub(r'[\u200b\u200c\u200d\ufeff\u00a0]', ' ', value).strip()

Patterns to handle:

\n, \r, \t inside cell values -> single space
Multiple consecutive spaces -> single space
Non-breaking spaces ( , \u00a0) -> regular space
Zero-width characters -> remove

HTML Entity Decode

Entity	Character	Entity	Character
`&`	`&`	`"`	`"`
`<`	`<`	`'`	`'`
`>`	`>`	`'`	`'`
` `		`’`	(curly ')
`—`	`--`	`—`	`--`

import html
value = html.unescape(value)

Unicode Normalization

import unicodedata
value = unicodedata.normalize('NFKC', value)

This handles:

Fancy quotes -> standard quotes
Ligatures -> separate characters (e.g. ﬁ -> fi)
Full-width characters -> standard (e.g. Ａ -> A)
Superscript/subscript numbers -> regular numbers

Empty Value Standardization

Input	Markdown Output	JSON Output
`""` (empty string)	`N/A`	`null`
`"-"` or `"--"`	`N/A`	`null`
`"N/A"`, `"n/a"`, `"NA"`	`N/A`	`null`
`"None"`, `"null"`	`N/A`	`null`
`"TBD"`, `"TBA"`	`TBD`	`"TBD"`

Price Normalization

Apply when extracting product, pricing, or financial data.

Extraction Pattern

import re

def normalize_price(raw):
    if not raw:
        return None
    # Remove currency words
    cleaned = re.sub(r'(?i)(USD|EUR|GBP|BRL|R\$|US\$)', '', raw)
    # Extract numeric value (handles 1,234.56 and 1.234,56 formats)
    match = re.search(r'[\d.,]+', cleaned)
    if not match:
        return None
    num_str = match.group()
    # Detect format: if last separator is comma with 2 digits after, it's decimal
    if re.search(r',\d{2}$', num_str):
        num_str = num_str.replace('.', '').replace(',', '.')
    else:
        num_str = num_str.replace(',', '')
    return float(num_str)

Currency Detection

Symbol/Code	Currency	Symbol/Code	Currency
`$`, `US$`, `USD`	US Dollar	`R$`, `BRL`	Brazilian Real
`€`, `EUR`	Euro	`£`, `GBP`	British Pound
`¥`, `JPY`	Yen	`₹`, `INR`	Indian Rupee
`C$`, `CAD`	Canadian Dollar	`A$`, `AUD`	Australian Dollar

Output Format

{
  "price": 29.99,
  "currency": "USD",
  "rawPrice": "$29.99"
}

For Markdown, show formatted: $29.99 (right-aligned in table).

Date Normalization

Normalize all dates to ISO-8601 format.

Common Formats to Handle

Input Format	Example	Normalized
Full text	February 25, 2026	2026-02-25
Short text	Feb 25, 2026	2026-02-25
US numeric	02/25/2026	2026-02-25
EU numeric	25/02/2026	2026-02-25
ISO already	2026-02-25	2026-02-25
Relative	3 days ago	(compute from now)
Relative	Yesterday	(compute from now)
Timestamp	1740441600	2025-02-25
With time	2026-02-25T14:30:00Z	2026-02-25 14:30

Ambiguous Dates

When format is ambiguous (e.g. 03/04/2026):

Default to US format (MM/DD/YYYY) unless site is clearly non-US
Check page lang attribute or URL TLD for locale hints
Note ambiguity in delivery notes

Relative Date Resolution

from datetime import datetime, timedelta
import re

def resolve_relative_date(text):
    text = text.lower().strip()
    today = datetime.now()

    if 'today' in text: return today.strftime('%Y-%m-%d')
    if 'yesterday' in text: return (today - timedelta(days=1)).strftime('%Y-%m-%d')

    match = re.search(r'(\d+)\s*(hour|day|week|month|year)s?\s*ago', text)
    if match:
        n, unit = int(match.group(1)), match.group(2)
        deltas = {'hour': 0, 'day': n, 'week': n*7, 'month': n*30, 'year': n*365}
        return (today - timedelta(days=deltas.get(unit, 0))).strftime('%Y-%m-%d')

    return text  # Return as-is if can't parse

URL Resolution

Convert relative URLs to absolute.

Patterns

Input	Base URL	Resolved
`/products/item-1`	`https://example.com/shop`	`https://example.com/products/item-1`
`item-1`	`https://example.com/shop/`	`https://example.com/shop/item-1`
`//cdn.example.com/img`	`https://example.com`	`https://cdn.example.com/img`
`https://other.com/page`	(any)	`https://other.com/page` (absolute)

JavaScript Resolution

function resolveUrl(relative, base) {
  try { return new URL(relative, base || window.location.href).href; }
  catch { return relative; }
}

Phone Normalization

For contact mode extraction.

Pattern

import re

def normalize_phone(raw):
    if not raw:
        return None
    # Remove all non-digit chars except leading +
    digits = re.sub(r'[^\d+]', '', raw)
    if not digits or len(digits) < 7:
        return None
    # Add + prefix if looks international
    if len(digits) >= 11 and not digits.startswith('+'):
        digits = '+' + digits
    return digits

Format by Context

Context	Format Example
JSON output	`"+5511999998888"`
Markdown table	`+55 11 99999-8888`
CSV output	`"+5511999998888"`

Deduplication

Exact Deduplication

def deduplicate(records, key_fields=None):
    """Remove exact duplicate records.
    If key_fields provided, deduplicate by those fields only.
    """
    seen = set()
    unique = []
    for record in records:
        if key_fields:
            key = tuple(record.get(f) for f in key_fields)
        else:
            key = tuple(sorted(record.items()))
        if key not in seen:
            seen.add(key)
            unique.append(record)
    return unique, len(records) - len(unique)  # returns (unique_list, removed_count)

Near-Duplicate Detection

When records share key fields but differ in details:

Group by key fields (e.g. product name + source)
For each group, keep the record with fewest null values
If tie, keep the first occurrence
Report in notes: "Merged N near-duplicate records"

Dedup Key Selection by Mode

Mode	Key Fields
product	name + source (or name + brand)
contact	name + email (or name + org)
jobs	title + company + location
events	title + date + location
table	all fields (exact match)
list	first 2-3 identifying fields

Text Cleaning

Remove Noise

Common noise patterns to strip from extracted text:

Pattern	Action
`\[edit\]`, `\[citation needed\]`	Remove (Wikipedia)
`Read more...`, `See more`	Remove (truncation markers)
`Sponsored`, `Ad`, `Promoted`	Remove or flag
Cookie consent text	Remove
Navigation breadcrumbs	Remove
Footer boilerplate	Remove

Sentence Case Normalization

When extracting ALL-CAPS or inconsistent-case text:

def normalize_case(text):
    if text.isupper() and len(text) > 3:
        return text.title()  # ALL CAPS -> Title Case
    return text

Only apply when: field is clearly ALL-CAPS input (common in older sites), user requests it, or data looks better normalized.

Data Type Coercion

Automatic Type Detection

Raw Value	Detected Type	Coerced Value
`"123"`	integer	`123`
`"12.99"`	float	`12.99`
`"true"`	boolean	`true`
`"false"`	boolean	`false`
`"2026-02-25"`	date string	`"2026-02-25"`
`"$29.99"`	price	`29.99` + currency
`"4.5/5"`	rating	`4.5`
`"1,234"`	integer	`1234`

Rating Normalization

import re

def normalize_rating(raw):
    if not raw:
        return None
    match = re.search(r'([\d.]+)\s*(?:/\s*([\d.]+))?', str(raw))
    if match:
        score = float(match.group(1))
        max_score = float(match.group(2)) if match.group(2) else 5.0
        return round(score / max_score * 5, 1)  # Normalize to /5 scale
    return None

Enrichment Patterns

Domain Extraction

Add domain from full URLs:

from urllib.parse import urlparse

def extract_domain(url):
    try:
        parsed = urlparse(url)
        domain = parsed.netloc.replace('www.', '')
        return domain
    except:
        return None

Word Count

For article mode:

def word_count(text):
    return len(text.split()) if text else 0

Relative Time

Add human-readable time since date:

def time_since(date_str):
    from datetime import datetime
    try:
        dt = datetime.fromisoformat(date_str)
        delta = datetime.now() - dt
        if delta.days == 0: return "Today"
        if delta.days == 1: return "Yesterday"
        if delta.days < 7: return f"{delta.days} days ago"
        if delta.days < 30: return f"{delta.days // 7} weeks ago"
        if delta.days < 365: return f"{delta.days // 30} months ago"
        return f"{delta.days // 365} years ago"
    except:
        return None

Transform Pipeline Order

Apply transforms in this sequence:

HTML entity decode - raw text cleanup
Unicode normalization - character standardization
Whitespace cleanup - spacing normalization
Empty value standardization - null/N/A handling
URL resolution - relative to absolute
Data type coercion - strings to numbers/dates
Price normalization - if applicable
Date normalization - if applicable
Phone normalization - if applicable
Text cleaning - noise removal
Deduplication - remove duplicates
Sorting - user-requested order
Enrichment - domain, word count, etc.

Not all steps apply to every extraction. Apply only what's relevant to the data type and extraction mode.

Entity	Character	Entity	Character
`&`	`&`	`"`	`"`
`<`	`<`	`'`	`'`
`>`	`>`	`'`	`'`
` `		`’`	(curly ')
`—`	`--`	`—`	`--`

Entity	Character	Entity	Character
`&`	`&`	`"`	`"`
`<`	`<`	`'`	`'`
`>`	`>`	`'`	`'`
` `		`’`	(curly ')
`—`	`--`	`—`	`--`

11 KiB Raw Blame History Unescape Escape