Files
daymade 2896870061 feat: add financial-data-collector skill for US equity data collection
New skill that collects real financial data for any US publicly traded company
via yfinance. Outputs structured JSON with market data, historical financials,
WACC inputs, and analyst estimates. Includes 9-check validation script and
reference docs for yfinance pitfalls (NaN years, field aliases, FCF mismatch).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 19:40:52 +08:00

3.1 KiB

yfinance Pitfalls & Field Mapping

NaN Year Patterns

yfinance frequently returns NaN for older fiscal years. Observed patterns:

Ticker NaN Years Notes
META 2020, 2021 All fields NaN; must supplement from 10-K
General Varies Older years (>3 years back) are less reliable

Workaround: Check every field with pd.notna(). Report NaN years to user. Never fill with estimates.

Field Name Variants

yfinance row index names are not fully stable across versions. Use fallback chains:

FIELD_ALIASES = {
    "revenue": ["Total Revenue", "Revenue", "Operating Revenue"],
    "ebit": ["Operating Income", "EBIT"],
    "ebitda": ["EBITDA", "Normalized EBITDA"],
    "tax": ["Tax Provision", "Income Tax Expense", "Tax Effect Of Unusual Items"],
    "net_income": ["Net Income", "Net Income Common Stockholders"],
    "capex": ["Capital Expenditure", "Capital Expenditures"],
    "ocf": ["Operating Cash Flow", "Cash Flow From Continuing Operating Activities"],
    "da": ["Depreciation And Amortization", "Depreciation Amortization Depletion"],
    "fcf": ["Free Cash Flow"],
    "nwc": ["Change In Working Capital", "Changes In Working Capital"],
    "total_debt": ["Total Debt"],
    "cash": ["Cash And Cash Equivalents"],
    "short_investments": ["Other Short Term Investments", "Short Term Investments"],
    "sbc": ["Stock Based Compensation"],
}

def safe_get(df, aliases, col):
    for alias in aliases:
        if alias in df.index:
            val = df.loc[alias, col]
            return float(val) if pd.notna(val) else None
    return None

Datetime Column Index

yfinance returns DataFrame columns as pandas.Timestamp, not integer years:

# ❌ WRONG
financials[2024]  # KeyError

# ✅ RIGHT
year_col = [c for c in financials.columns if c.year == 2024][0]
financials.loc["Total Revenue", year_col]

Shares Outstanding Variants

# Preferred: diluted
shares = info.get("sharesOutstanding")  # Basic shares
# Alternative
shares = info.get("impliedSharesOutstanding")  # May be more accurate

Risk-Free Rate via ^TNX

tnx = yf.Ticker("^TNX")
hist = tnx.history(period="1d")
risk_free_rate = hist["Close"].iloc[-1] / 100  # Convert from percentage

Pitfall: ^TNX returns yield as percentage (e.g., 4.3), not decimal (0.043). Divide by 100.

Analyst Estimates

ticker = yf.Ticker("META")

# Revenue estimates
rev_est = ticker.revenue_estimate  # DataFrame with columns: avg, low, high, ...
# Rows: "0q" (current quarter), "+1q", "0y" (current year), "+1y"

# EPS estimates
eps_est = ticker.eps_trend  # Similar structure

Pitfall: These APIs change between yfinance versions. Always wrap in try/except.

FCF Definition Mismatch

Source FCF Definition META 2024
yfinance Operating CF + CapEx ~$54.1B
Morgan Stanley DCF EBITDA - Taxes - CapEx - NWC - SBC ~$37.9B
Difference SBC (~$22B) + other adjustments ~30% gap

Always flag this in output metadata. Downstream DCF skills need to decide whether to use yfinance FCF or reconstruct from components.