fix: Enforce min_chunk_size in RAG chunker

- Filter out chunks smaller than min_chunk_size (default 100 tokens)
- Exception: Keep all chunks if entire document is smaller than target size
- All 15 tests passing (100% pass rate)

Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were
being created despite min_chunk_size=100 setting.

Test: pytest tests/test_rag_chunker.py -v
This commit is contained in:
yusyus
2026-02-07 20:59:03 +03:00
parent 3a769a27cd
commit 8b3f31409e
65 changed files with 16133 additions and 7 deletions

83
.dockerignore Normal file
View File

@@ -0,0 +1,83 @@
# Python artifacts
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Virtual environments
venv/
env/
ENV/
.venv
# Testing
.pytest_cache/
.coverage
.coverage.*
htmlcov/
.tox/
.hypothesis/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# Git
.git/
.gitignore
.gitattributes
# Documentation
docs/
*.md
!README.md
# CI/CD
.github/
.gitlab-ci.yml
.travis.yml
# Output directories
output/
data/
*.zip
*.tar.gz
# Logs
*.log
logs/
# Environment files
.env
.env.*
!.env.example
# Test files
tests/
test_*.py
*_test.py
# Docker
Dockerfile*
docker-compose*.yml
.dockerignore