fix: Enforce min_chunk_size in RAG chunker
- Filter out chunks smaller than min_chunk_size (default 100 tokens) - Exception: Keep all chunks if entire document is smaller than target size - All 15 tests passing (100% pass rate) Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were being created despite min_chunk_size=100 setting. Test: pytest tests/test_rag_chunker.py -v
This commit is contained in:
83
.dockerignore
Normal file
83
.dockerignore
Normal file
@@ -0,0 +1,83 @@
|
||||
# Python artifacts
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
.venv
|
||||
|
||||
# Testing
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
.coverage.*
|
||||
htmlcov/
|
||||
.tox/
|
||||
.hypothesis/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.DS_Store
|
||||
|
||||
# Git
|
||||
.git/
|
||||
.gitignore
|
||||
.gitattributes
|
||||
|
||||
# Documentation
|
||||
docs/
|
||||
*.md
|
||||
!README.md
|
||||
|
||||
# CI/CD
|
||||
.github/
|
||||
.gitlab-ci.yml
|
||||
.travis.yml
|
||||
|
||||
# Output directories
|
||||
output/
|
||||
data/
|
||||
*.zip
|
||||
*.tar.gz
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
|
||||
# Environment files
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# Test files
|
||||
tests/
|
||||
test_*.py
|
||||
*_test.py
|
||||
|
||||
# Docker
|
||||
Dockerfile*
|
||||
docker-compose*.yml
|
||||
.dockerignore
|
||||
Reference in New Issue
Block a user