314 lines
10 KiB
Markdown
314 lines
10 KiB
Markdown
---
|
|
name: excel-automation
|
|
description: Create, parse, and control Excel files on macOS. Professional formatting with openpyxl, complex xlsm parsing with stdlib zipfile+xml for investment bank financial models, and Excel window control via AppleScript. Use when creating formatted Excel reports, parsing financial models that openpyxl cannot handle, or automating Excel on macOS.
|
|
---
|
|
|
|
# Excel Automation
|
|
|
|
Create professional Excel files, parse complex financial models, and control Excel on macOS.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Create a formatted Excel report
|
|
uv run --with openpyxl scripts/create_formatted_excel.py output.xlsx
|
|
|
|
# Parse a complex xlsm that openpyxl can't handle
|
|
uv run scripts/parse_complex_excel.py model.xlsm # List sheets
|
|
uv run scripts/parse_complex_excel.py model.xlsm "DCF" # Extract a sheet
|
|
uv run scripts/parse_complex_excel.py model.xlsm --fix # Fix corrupted names
|
|
|
|
# Control Excel via AppleScript (with timeout to prevent hangs)
|
|
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
|
|
```
|
|
|
|
## Overview
|
|
|
|
Three capabilities:
|
|
|
|
| Capability | Tool | When to Use |
|
|
|-----------|------|-------------|
|
|
| **Create** formatted Excel | `openpyxl` | Reports, mockups, dashboards |
|
|
| **Parse** complex xlsm/xlsx | `zipfile` + `xml.etree` | Financial models, VBA workbooks, >1MB files |
|
|
| **Control** Excel window | AppleScript (`osascript`) | Zoom, scroll, select cells programmatically |
|
|
|
|
## Tool Selection Decision Tree
|
|
|
|
```
|
|
Is the file simple (data export, no VBA, <1MB)?
|
|
├─ YES → openpyxl or pandas
|
|
└─ NO
|
|
├─ Is it .xlsm or from investment bank / >1MB?
|
|
│ └─ YES → zipfile + xml.etree.ElementTree (stdlib)
|
|
└─ Is it truly .xls (BIFF format)?
|
|
└─ YES → xlrd
|
|
```
|
|
|
|
**Signals of "complex" Excel**: file >1MB, `.xlsm` extension, from investment bank/broker, contains VBA macros.
|
|
|
|
**IMPORTANT**: Always run `file <path>` first — extensions lie. A `.xls` file may actually be a ZIP-based xlsx.
|
|
|
|
## Creating Excel Files (openpyxl)
|
|
|
|
### Professional Color Convention (Investment Banking Standard)
|
|
|
|
| Color | RGB Code | Meaning |
|
|
|-------|----------|---------|
|
|
| Blue | `0000FF` | User input / assumption |
|
|
| Black | `000000` | Calculated value |
|
|
| Green | `008000` | Cross-sheet reference |
|
|
| White on dark blue | `FFFFFF` on `4472C4` | Section headers |
|
|
| Dark blue text | `1F4E79` | Title |
|
|
|
|
### Core Formatting Patterns
|
|
|
|
```python
|
|
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
|
|
|
|
# Fonts
|
|
BLUE_FONT = Font(color="0000FF", size=10, name="Calibri")
|
|
BLACK_FONT_BOLD = Font(color="000000", size=10, name="Calibri", bold=True)
|
|
GREEN_FONT = Font(color="008000", size=10, name="Calibri")
|
|
HEADER_FONT = Font(color="FFFFFF", size=12, name="Calibri", bold=True)
|
|
|
|
# Fills
|
|
DARK_BLUE_FILL = PatternFill("solid", fgColor="4472C4")
|
|
LIGHT_BLUE_FILL = PatternFill("solid", fgColor="D9E1F2")
|
|
INPUT_GREEN_FILL = PatternFill("solid", fgColor="E2EFDA")
|
|
LIGHT_GRAY_FILL = PatternFill("solid", fgColor="F2F2F2")
|
|
|
|
# Borders
|
|
THIN_BORDER = Border(bottom=Side(style="thin", color="B2B2B2"))
|
|
BOTTOM_DOUBLE = Border(bottom=Side(style="double", color="000000"))
|
|
```
|
|
|
|
### Number Format Codes
|
|
|
|
| Format | Code | Example |
|
|
|--------|------|---------|
|
|
| Currency | `'$#,##0'` | $1,234 |
|
|
| Currency with decimals | `'$#,##0.00'` | $1,234.56 |
|
|
| Percentage | `'0.0%'` | 12.3% |
|
|
| Percentage (2 decimal) | `'0.00%'` | 12.34% |
|
|
| Number with commas | `'#,##0'` | 1,234 |
|
|
| Multiplier | `'0.0x'` | 1.5x |
|
|
|
|
### Conditional Formatting (Sensitivity Tables)
|
|
|
|
Red-to-green gradient for sensitivity analysis:
|
|
|
|
```python
|
|
from openpyxl.formatting.rule import ColorScaleRule
|
|
|
|
rule = ColorScaleRule(
|
|
start_type="min", start_color="F8696B", # Red (low)
|
|
mid_type="percentile", mid_value=50, mid_color="FFEB84", # Yellow (mid)
|
|
end_type="max", end_color="63BE7B" # Green (high)
|
|
)
|
|
ws.conditional_formatting.add(f"B2:F6", rule)
|
|
```
|
|
|
|
### Execution
|
|
|
|
```bash
|
|
uv run --with openpyxl scripts/create_formatted_excel.py
|
|
```
|
|
|
|
Full template script: See `scripts/create_formatted_excel.py`
|
|
|
|
## Parsing Complex Excel (zipfile + xml)
|
|
|
|
When openpyxl fails on complex xlsm files (corrupted DefinedNames, complex VBA), use stdlib directly.
|
|
|
|
### XLSX Internal ZIP Structure
|
|
|
|
```
|
|
file.xlsx (ZIP archive)
|
|
├── [Content_Types].xml
|
|
├── xl/
|
|
│ ├── workbook.xml ← Sheet names + order
|
|
│ ├── sharedStrings.xml ← All text values (lookup table)
|
|
│ ├── worksheets/
|
|
│ │ ├── sheet1.xml ← Cell data for sheet 1
|
|
│ │ ├── sheet2.xml ← Cell data for sheet 2
|
|
│ │ └── ...
|
|
│ └── _rels/
|
|
│ └── workbook.xml.rels ← Maps rId → sheetN.xml
|
|
└── _rels/.rels
|
|
```
|
|
|
|
### Sheet Name Resolution (Two-Step)
|
|
|
|
Sheet names in `workbook.xml` link to physical files via `_rels/workbook.xml.rels`:
|
|
|
|
```python
|
|
import zipfile
|
|
import xml.etree.ElementTree as ET
|
|
|
|
MAIN_NS = 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'
|
|
REL_NS = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
|
|
RELS_NS = 'http://schemas.openxmlformats.org/package/2006/relationships'
|
|
|
|
def get_sheet_path(zf, sheet_name):
|
|
"""Resolve sheet name to physical XML file path inside ZIP."""
|
|
# Step 1: workbook.xml → find rId for the sheet name
|
|
wb_xml = ET.fromstring(zf.read('xl/workbook.xml'))
|
|
sheets = wb_xml.findall(f'.//{{{MAIN_NS}}}sheet')
|
|
rid = None
|
|
for s in sheets:
|
|
if s.get('name') == sheet_name:
|
|
rid = s.get(f'{{{REL_NS}}}id')
|
|
break
|
|
if not rid:
|
|
raise ValueError(f"Sheet '{sheet_name}' not found")
|
|
|
|
# Step 2: workbook.xml.rels → map rId to file path
|
|
rels_xml = ET.fromstring(zf.read('xl/_rels/workbook.xml.rels'))
|
|
for rel in rels_xml.findall(f'{{{RELS_NS}}}Relationship'):
|
|
if rel.get('Id') == rid:
|
|
return 'xl/' + rel.get('Target')
|
|
|
|
raise ValueError(f"No file mapping for {rid}")
|
|
```
|
|
|
|
### Cell Data Extraction
|
|
|
|
```python
|
|
def extract_cells(zf, sheet_path):
|
|
"""Extract all cell values from a sheet XML."""
|
|
# Build shared strings lookup
|
|
shared = []
|
|
try:
|
|
ss_xml = ET.fromstring(zf.read('xl/sharedStrings.xml'))
|
|
for si in ss_xml.findall(f'{{{MAIN_NS}}}si'):
|
|
texts = si.itertext()
|
|
shared.append(''.join(texts))
|
|
except KeyError:
|
|
pass # No shared strings
|
|
|
|
# Parse sheet cells
|
|
sheet_xml = ET.fromstring(zf.read(sheet_path))
|
|
rows = sheet_xml.findall(f'.//{{{MAIN_NS}}}row')
|
|
|
|
data = {}
|
|
for row in rows:
|
|
for cell in row.findall(f'{{{MAIN_NS}}}c'):
|
|
ref = cell.get('r') # e.g., "A1"
|
|
cell_type = cell.get('t') # "s" = shared string, None = number
|
|
val_el = cell.find(f'{{{MAIN_NS}}}v')
|
|
|
|
if val_el is not None and val_el.text:
|
|
if cell_type == 's':
|
|
data[ref] = shared[int(val_el.text)]
|
|
else:
|
|
try:
|
|
data[ref] = float(val_el.text)
|
|
except ValueError:
|
|
data[ref] = val_el.text
|
|
return data
|
|
```
|
|
|
|
### Fixing Corrupted DefinedNames
|
|
|
|
Investment bank xlsm files often have corrupted `<definedName>` entries containing "Formula removed":
|
|
|
|
```python
|
|
def fix_defined_names(zf_in_path, zf_out_path):
|
|
"""Remove corrupted DefinedNames and repackage."""
|
|
import shutil, tempfile
|
|
with tempfile.TemporaryDirectory() as tmp:
|
|
tmp = Path(tmp)
|
|
with zipfile.ZipFile(zf_in_path, 'r') as zf:
|
|
zf.extractall(tmp)
|
|
|
|
wb_xml_path = tmp / 'xl' / 'workbook.xml'
|
|
tree = ET.parse(wb_xml_path)
|
|
root = tree.getroot()
|
|
|
|
ns = {'main': MAIN_NS}
|
|
defined_names = root.find('.//main:definedNames', ns)
|
|
if defined_names is not None:
|
|
for name in list(defined_names):
|
|
if name.text and "Formula removed" in name.text:
|
|
defined_names.remove(name)
|
|
|
|
tree.write(wb_xml_path, encoding='utf-8', xml_declaration=True)
|
|
|
|
with zipfile.ZipFile(zf_out_path, 'w', zipfile.ZIP_DEFLATED) as zf:
|
|
for fp in tmp.rglob('*'):
|
|
if fp.is_file():
|
|
zf.write(fp, fp.relative_to(tmp))
|
|
```
|
|
|
|
Full template script: See `scripts/parse_complex_excel.py`
|
|
|
|
## Controlling Excel on macOS (AppleScript)
|
|
|
|
All commands verified on macOS with Microsoft Excel.
|
|
|
|
### Verified Commands
|
|
|
|
```bash
|
|
# Activate Excel (bring to front)
|
|
osascript -e 'tell application "Microsoft Excel" to activate'
|
|
|
|
# Open a file
|
|
osascript -e 'tell application "Microsoft Excel" to open POSIX file "/path/to/file.xlsx"'
|
|
|
|
# Set zoom level (percentage)
|
|
osascript -e 'tell application "Microsoft Excel"
|
|
set zoom of active window to 120
|
|
end tell'
|
|
|
|
# Scroll to specific row
|
|
osascript -e 'tell application "Microsoft Excel"
|
|
set scroll row of active window to 45
|
|
end tell'
|
|
|
|
# Scroll to specific column
|
|
osascript -e 'tell application "Microsoft Excel"
|
|
set scroll column of active window to 3
|
|
end tell'
|
|
|
|
# Select a cell range
|
|
osascript -e 'tell application "Microsoft Excel"
|
|
select range "A1" of active sheet
|
|
end tell'
|
|
|
|
# Select a specific sheet by name
|
|
osascript -e 'tell application "Microsoft Excel"
|
|
activate object sheet "DCF" of active workbook
|
|
end tell'
|
|
```
|
|
|
|
### Timing and Timeout
|
|
|
|
Always add `sleep 1` between AppleScript commands and subsequent operations (e.g., screenshot) to allow UI rendering.
|
|
|
|
**IMPORTANT**: `osascript` will hang indefinitely if Excel is not running or not responding. Always wrap with `timeout`:
|
|
|
|
```bash
|
|
# Safe pattern: 5-second timeout
|
|
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
|
|
|
|
# Check exit code: 124 = timed out
|
|
if [ $? -eq 124 ]; then
|
|
echo "Excel not responding — is it running?"
|
|
fi
|
|
```
|
|
|
|
## Common Mistakes
|
|
|
|
| Mistake | Correction |
|
|
|---------|-----------|
|
|
| openpyxl fails on complex xlsm → try monkey-patching | Switch to `zipfile` + `xml.etree` immediately |
|
|
| Count Chinese characters with `wc -c` | Use `wc -m` (chars, not bytes; Chinese = 3 bytes/char) |
|
|
| Trust file extension | Run `file <path>` first to confirm actual format |
|
|
| openpyxl `load_workbook` hangs on large xlsm | Use `zipfile` for targeted extraction instead of loading entire workbook |
|
|
|
|
## Important Notes
|
|
|
|
- Execute Python scripts with `uv run --with openpyxl` (never use system Python)
|
|
- LibreOffice (`soffice --headless`) can convert formats and recalculate formulas
|
|
- Detailed formatting reference: See `references/formatting-reference.md`
|