Files
claude-code-skills-reference/financial-data-collector/references/yfinance-pitfalls.md
daymade 2896870061 feat: add financial-data-collector skill for US equity data collection
New skill that collects real financial data for any US publicly traded company
via yfinance. Outputs structured JSON with market data, historical financials,
WACC inputs, and analyst estimates. Includes 9-check validation script and
reference docs for yfinance pitfalls (NaN years, field aliases, FCF mismatch).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 19:40:52 +08:00

100 lines
3.1 KiB
Markdown

# yfinance Pitfalls & Field Mapping
## NaN Year Patterns
yfinance frequently returns NaN for older fiscal years. Observed patterns:
| Ticker | NaN Years | Notes |
|--------|-----------|-------|
| META | 2020, 2021 | All fields NaN; must supplement from 10-K |
| General | Varies | Older years (>3 years back) are less reliable |
**Workaround**: Check every field with `pd.notna()`. Report NaN years to user. Never fill with estimates.
## Field Name Variants
yfinance row index names are not fully stable across versions. Use fallback chains:
```python
FIELD_ALIASES = {
"revenue": ["Total Revenue", "Revenue", "Operating Revenue"],
"ebit": ["Operating Income", "EBIT"],
"ebitda": ["EBITDA", "Normalized EBITDA"],
"tax": ["Tax Provision", "Income Tax Expense", "Tax Effect Of Unusual Items"],
"net_income": ["Net Income", "Net Income Common Stockholders"],
"capex": ["Capital Expenditure", "Capital Expenditures"],
"ocf": ["Operating Cash Flow", "Cash Flow From Continuing Operating Activities"],
"da": ["Depreciation And Amortization", "Depreciation Amortization Depletion"],
"fcf": ["Free Cash Flow"],
"nwc": ["Change In Working Capital", "Changes In Working Capital"],
"total_debt": ["Total Debt"],
"cash": ["Cash And Cash Equivalents"],
"short_investments": ["Other Short Term Investments", "Short Term Investments"],
"sbc": ["Stock Based Compensation"],
}
def safe_get(df, aliases, col):
for alias in aliases:
if alias in df.index:
val = df.loc[alias, col]
return float(val) if pd.notna(val) else None
return None
```
## Datetime Column Index
yfinance returns DataFrame columns as `pandas.Timestamp`, not integer years:
```python
# ❌ WRONG
financials[2024] # KeyError
# ✅ RIGHT
year_col = [c for c in financials.columns if c.year == 2024][0]
financials.loc["Total Revenue", year_col]
```
## Shares Outstanding Variants
```python
# Preferred: diluted
shares = info.get("sharesOutstanding") # Basic shares
# Alternative
shares = info.get("impliedSharesOutstanding") # May be more accurate
```
## Risk-Free Rate via ^TNX
```python
tnx = yf.Ticker("^TNX")
hist = tnx.history(period="1d")
risk_free_rate = hist["Close"].iloc[-1] / 100 # Convert from percentage
```
**Pitfall**: ^TNX returns yield as percentage (e.g., 4.3), not decimal (0.043). Divide by 100.
## Analyst Estimates
```python
ticker = yf.Ticker("META")
# Revenue estimates
rev_est = ticker.revenue_estimate # DataFrame with columns: avg, low, high, ...
# Rows: "0q" (current quarter), "+1q", "0y" (current year), "+1y"
# EPS estimates
eps_est = ticker.eps_trend # Similar structure
```
**Pitfall**: These APIs change between yfinance versions. Always wrap in try/except.
## FCF Definition Mismatch
| Source | FCF Definition | META 2024 |
|--------|---------------|-----------|
| yfinance | Operating CF + CapEx | ~$54.1B |
| Morgan Stanley DCF | EBITDA - Taxes - CapEx - NWC - SBC | ~$37.9B |
| Difference | SBC (~$22B) + other adjustments | ~30% gap |
**Always flag this in output metadata.** Downstream DCF skills need to decide whether to use yfinance FCF or reconstruct from components.