Merge branch 'development' into feature/video-scraper-pipeline
Sync with latest development changes including ruff formatting, bug fixes, and pinecone adaptor additions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -309,6 +309,15 @@ package_path = adaptor.package(
|
||||
)
|
||||
```
|
||||
|
||||
#### Shared Embedding Methods
|
||||
|
||||
The base `SkillAdaptor` class provides two shared embedding methods inherited by all vector database adaptors (chroma, weaviate, pinecone):
|
||||
|
||||
- `_generate_openai_embeddings(texts, model)` -- Generate embeddings via the OpenAI API.
|
||||
- `_generate_st_embeddings(texts, model)` -- Generate embeddings using a local sentence-transformers model.
|
||||
|
||||
These methods are available on any adaptor instance returned by `get_adaptor()` for vector database targets, so you do not need to implement embedding logic per-adaptor.
|
||||
|
||||
---
|
||||
|
||||
### 6. Skill Upload API
|
||||
|
||||
@@ -620,7 +620,8 @@ skill-seekers package SKILL_DIRECTORY [options]
|
||||
| | `--batch-size` | 100 | Chunks per batch |
|
||||
| | `--chunk-for-rag` | | Enable RAG chunking |
|
||||
| | `--chunk-tokens` | 512 | Max tokens per chunk |
|
||||
| | `--no-preserve-code` | | Allow code block splitting |
|
||||
| | `--chunk-overlap-tokens` | 50 | Overlap between chunks (tokens) |
|
||||
| | `--no-preserve-code-blocks` | | Allow code block splitting |
|
||||
|
||||
**Supported Platforms:**
|
||||
|
||||
|
||||
@@ -194,7 +194,9 @@ skill-seekers package output/my-skill/ \
|
||||
| `--chunk-for-rag` | auto | Enable chunking |
|
||||
| `--chunk-tokens` | 512 | Tokens per chunk |
|
||||
| `--chunk-overlap-tokens` | 50 | Overlap between chunks (tokens) |
|
||||
| `--no-preserve-code` | - | Allow splitting code blocks |
|
||||
| `--no-preserve-code-blocks` | - | Allow splitting code blocks |
|
||||
|
||||
> **Auto-scaling overlap:** When `--chunk-tokens` is set to a non-default value but `--chunk-overlap-tokens` is left at default (50), the overlap automatically scales to `max(50, chunk_tokens / 10)` for better context preservation with larger chunks.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -598,7 +598,8 @@ skill-seekers package SKILL_DIRECTORY [options]
|
||||
| | `--batch-size` | 100 | Chunks per batch |
|
||||
| | `--chunk-for-rag` | | Enable RAG chunking |
|
||||
| | `--chunk-tokens` | 512 | Max tokens per chunk |
|
||||
| | `--no-preserve-code` | | Allow code block splitting |
|
||||
| | `--chunk-overlap-tokens` | 50 | Overlap between chunks (tokens) |
|
||||
| | `--no-preserve-code-blocks` | | Allow code block splitting |
|
||||
|
||||
**Supported Platforms:**
|
||||
|
||||
|
||||
@@ -194,7 +194,9 @@ skill-seekers package output/my-skill/ \
|
||||
| `--chunk-for-rag` | auto | Enable chunking |
|
||||
| `--chunk-tokens` | 512 | Tokens per chunk |
|
||||
| `--chunk-overlap-tokens` | 50 | Overlap between chunks (tokens) |
|
||||
| `--no-preserve-code` | - | Allow splitting code blocks |
|
||||
| `--no-preserve-code-blocks` | - | Allow splitting code blocks |
|
||||
|
||||
> **自动缩放重叠:** 当 `--chunk-tokens` 设置为非默认值但 `--chunk-overlap-tokens` 保持默认值 (50) 时,重叠会自动缩放为 `max(50, chunk_tokens / 10)`,以在较大的分块中实现更好的上下文保留。
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user