- Filter out chunks smaller than min_chunk_size (default 100 tokens) - Exception: Keep all chunks if entire document is smaller than target size - All 15 tests passing (100% pass rate) Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were being created despite min_chunk_size=100 setting. Test: pytest tests/test_rag_chunker.py -v
84 lines
697 B
Plaintext
84 lines
697 B
Plaintext
# Python artifacts
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
*.so
|
|
.Python
|
|
build/
|
|
develop-eggs/
|
|
dist/
|
|
downloads/
|
|
eggs/
|
|
.eggs/
|
|
lib/
|
|
lib64/
|
|
parts/
|
|
sdist/
|
|
var/
|
|
wheels/
|
|
*.egg-info/
|
|
.installed.cfg
|
|
*.egg
|
|
|
|
# Virtual environments
|
|
venv/
|
|
env/
|
|
ENV/
|
|
.venv
|
|
|
|
# Testing
|
|
.pytest_cache/
|
|
.coverage
|
|
.coverage.*
|
|
htmlcov/
|
|
.tox/
|
|
.hypothesis/
|
|
|
|
# IDE
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
*~
|
|
.DS_Store
|
|
|
|
# Git
|
|
.git/
|
|
.gitignore
|
|
.gitattributes
|
|
|
|
# Documentation
|
|
docs/
|
|
*.md
|
|
!README.md
|
|
|
|
# CI/CD
|
|
.github/
|
|
.gitlab-ci.yml
|
|
.travis.yml
|
|
|
|
# Output directories
|
|
output/
|
|
data/
|
|
*.zip
|
|
*.tar.gz
|
|
|
|
# Logs
|
|
*.log
|
|
logs/
|
|
|
|
# Environment files
|
|
.env
|
|
.env.*
|
|
!.env.example
|
|
|
|
# Test files
|
|
tests/
|
|
test_*.py
|
|
*_test.py
|
|
|
|
# Docker
|
|
Dockerfile*
|
|
docker-compose*.yml
|
|
.dockerignore
|