fix: Enforce min_chunk_size in RAG chunker
- Filter out chunks smaller than min_chunk_size (default 100 tokens) - Exception: Keep all chunks if entire document is smaller than target size - All 15 tests passing (100% pass rate) Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were being created despite min_chunk_size=100 setting. Test: pytest tests/test_rag_chunker.py -v
This commit is contained in:
41
.env.example
Normal file
41
.env.example
Normal file
@@ -0,0 +1,41 @@
|
||||
# Skill Seekers Docker Environment Configuration
|
||||
# Copy this file to .env and fill in your API keys
|
||||
|
||||
# Claude AI / Anthropic API
|
||||
# Required for AI enhancement features
|
||||
# Get your key from: https://console.anthropic.com/
|
||||
ANTHROPIC_API_KEY=sk-ant-your-key-here
|
||||
|
||||
# Google Gemini API (Optional)
|
||||
# Required for Gemini platform support
|
||||
# Get your key from: https://makersuite.google.com/app/apikey
|
||||
GOOGLE_API_KEY=
|
||||
|
||||
# OpenAI API (Optional)
|
||||
# Required for OpenAI/ChatGPT platform support
|
||||
# Get your key from: https://platform.openai.com/api-keys
|
||||
OPENAI_API_KEY=
|
||||
|
||||
# GitHub Token (Optional, but recommended)
|
||||
# Increases rate limits from 60/hour to 5000/hour
|
||||
# Create token at: https://github.com/settings/tokens
|
||||
# Required scopes: public_repo (for public repos)
|
||||
GITHUB_TOKEN=
|
||||
|
||||
# MCP Server Configuration
|
||||
MCP_TRANSPORT=http
|
||||
MCP_PORT=8765
|
||||
|
||||
# Docker Resource Limits (Optional)
|
||||
# Uncomment to set custom limits
|
||||
# DOCKER_CPU_LIMIT=2.0
|
||||
# DOCKER_MEMORY_LIMIT=4g
|
||||
|
||||
# Vector Database Ports (Optional - change if needed)
|
||||
# WEAVIATE_PORT=8080
|
||||
# QDRANT_PORT=6333
|
||||
# CHROMA_PORT=8000
|
||||
|
||||
# Logging (Optional)
|
||||
# SKILL_SEEKERS_LOG_LEVEL=INFO
|
||||
# SKILL_SEEKERS_LOG_FILE=/data/logs/skill-seekers.log
|
||||
Reference in New Issue
Block a user