fix: Enforce min_chunk_size in RAG chunker

- Filter out chunks smaller than min_chunk_size (default 100 tokens) - Exception: Keep all chunks if entire document is smaller than target size - All 15 tests passing (100% pass rate) Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were being created despite min_chunk_size=100 setting. Test: pytest tests/test_rag_chunker.py -v
2026-02-07 20:59:03 +03:00
parent 3a769a27cd
commit 8b3f31409e
65 changed files with 16133 additions and 7 deletions
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,41 @@
+# Skill Seekers Docker Environment Configuration
+# Copy this file to .env and fill in your API keys
+
+# Claude AI / Anthropic API
+# Required for AI enhancement features
+# Get your key from: https://console.anthropic.com/
+ANTHROPIC_API_KEY=sk-ant-your-key-here
+
+# Google Gemini API (Optional)
+# Required for Gemini platform support
+# Get your key from: https://makersuite.google.com/app/apikey
+GOOGLE_API_KEY=
+
+# OpenAI API (Optional)
+# Required for OpenAI/ChatGPT platform support
+# Get your key from: https://platform.openai.com/api-keys
+OPENAI_API_KEY=
+
+# GitHub Token (Optional, but recommended)
+# Increases rate limits from 60/hour to 5000/hour
+# Create token at: https://github.com/settings/tokens
+# Required scopes: public_repo (for public repos)
+GITHUB_TOKEN=
+
+# MCP Server Configuration
+MCP_TRANSPORT=http
+MCP_PORT=8765
+
+# Docker Resource Limits (Optional)
+# Uncomment to set custom limits
+# DOCKER_CPU_LIMIT=2.0
+# DOCKER_MEMORY_LIMIT=4g
+
+# Vector Database Ports (Optional - change if needed)
+# WEAVIATE_PORT=8080
+# QDRANT_PORT=6333
+# CHROMA_PORT=8000
+
+# Logging (Optional)
+# SKILL_SEEKERS_LOG_LEVEL=INFO
+# SKILL_SEEKERS_LOG_FILE=/data/logs/skill-seekers.log