Files
Francy Lisboa 797839cdd4 feat: add bdistill behavioral-xray and knowledge-extraction skills (#366)
* feat: add bdistill behavioral-xray and knowledge-extraction skills

Two MCP-powered skills for AI model analysis:
- behavioral-xray: Self-probe across 6 dimensions with HTML reports
- knowledge-extraction: Domain knowledge extraction via Ollama for LoRA training

Repository: https://github.com/FrancyJGLisboa/bdistill
Install: pip install bdistill

* fix: remove curl|sh install command, update skills for current capabilities

- Removed pipe-to-shell Ollama install (flagged by docs security policy)
- Replaced with link to https://ollama.com
- Updated knowledge-extraction to reflect in-session mode, adversarial
  validation, tabular ML data, and compounding knowledge base
- Updated behavioral-xray with red-team and compliance use cases
- Removed ChatML/fine-tuning language — output is reference data
2026-03-21 11:59:29 +01:00

3.3 KiB

name, description, category, risk, source, date_added, author, tags, tools
name description category risk source date_added author tags tools
bdistill-knowledge-extraction Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed. ai-research safe community 2026-03-20 FrancyJGLisboa
ai
knowledge-extraction
domain-specific
data-moat
mcp
reference-data
claude
cursor
codex
copilot

Knowledge Extraction

Extract structured, quality-scored domain knowledge from any AI model — in-session from closed models (no API key) or locally from open-source models via Ollama.

Overview

bdistill turns your AI subscription sessions into a compounding knowledge base. The agent answers targeted domain questions, bdistill structures and quality-scores the responses, and the output accumulates into a searchable, exportable reference dataset.

Adversarial mode challenges the agent's claims — forcing evidence, corrections, and acknowledged limitations — producing validated knowledge entries.

When to Use This Skill

  • Use when you need structured reference data on any domain (medical, legal, finance, cybersecurity)
  • Use when building lookup tables, Q&A datasets, or research corpora
  • Use when generating training data for traditional ML models (regression, classification — NOT competing LLMs)
  • Use when you want cross-model comparison on domain knowledge

How It Works

Step 1: Install

pip install bdistill
claude mcp add bdistill -- bdistill-mcp   # Claude Code

Step 2: Extract knowledge in-session

/distill medical cardiology                    # Preset domain
/distill --custom kubernetes docker helm       # Custom terms
/distill --adversarial medical                 # With adversarial validation

Step 3: Search, export, compound

bdistill kb list                               # Show all domains
bdistill kb search "atrial fibrillation"       # Keyword search
bdistill kb export -d medical -f csv           # Export as spreadsheet
bdistill kb export -d medical -f markdown      # Readable knowledge document

Output Format

Structured reference JSONL — not training data:

{
  "question": "What causes myocardial infarction?",
  "answer": "Myocardial infarction results from acute coronary artery occlusion...",
  "domain": "medical",
  "category": "cardiology",
  "tags": ["mechanistic", "evidence-based"],
  "quality_score": 0.73,
  "confidence": 1.08,
  "validated": true,
  "source_model": "Claude Sonnet 4"
}

Tabular ML Data Generation

Generate structured training data for traditional ML models:

/schema sepsis | hr:float, bp:float, temp:float, wbc:float | risk:category[low,moderate,high,critical]

Exports as CSV ready for pandas/sklearn. Each row tracks source_model for cross-model analysis.

Local Model Extraction (Ollama)

For open-source models running locally:

# Install Ollama from https://ollama.com
ollama serve
ollama pull qwen3:4b

bdistill extract --domain medical --model qwen3:4b

Security & Safety Notes

  • In-session extraction uses your existing subscription — no additional API keys
  • Local extraction runs entirely on your machine via Ollama
  • No data is sent to external services
  • Output is reference data, not LLM training format
  • @bdistill-behavioral-xray - X-ray a model's behavioral patterns