feat: Add Official Microsoft & Gemini Skills (845+ Total)
🚀 Impact Significantly expands the capabilities of **Antigravity Awesome Skills** by integrating official skill collections from **Microsoft** and **Google Gemini**. This update increases the total skill count to **845+**, making the library even more comprehensive for AI coding assistants. ✨ Key Changes 1. New Official Skills - **Microsoft Skills**: Added a massive collection of official skills from [microsoft/skills](https://github.com/microsoft/skills). - Includes Azure, .NET, Python, TypeScript, and Semantic Kernel skills. - Preserves the original directory structure under `skills/official/microsoft/`. - Includes plugin skills from the `.github/plugins` directory. - **Gemini Skills**: Added official Gemini API development skills under `skills/gemini-api-dev/`. 2. New Scripts & Tooling - **`scripts/sync_microsoft_skills.py`**: A robust synchronization script that: - Clones the official Microsoft repository. - Preserves the original directory heirarchy. - Handles symlinks and plugin locations. - Generates attribution metadata. - **`scripts/tests/inspect_microsoft_repo.py`**: Debug tool to inspect the remote repository structure. - **`scripts/tests/test_comprehensive_coverage.py`**: Verification script to ensure 100% of skills are captured during sync. 3. Core Improvements - **`scripts/generate_index.py`**: Enhanced frontmatter parsing to safely handle unquoted values containing `@` symbols and commas (fixing issues with some Microsoft skill descriptions). - **`package.json`**: Added `sync:microsoft` and `sync:all-official` scripts for easy maintenance. 4. Documentation - Updated `README.md` to reflect the new skill counts (845+) and added Microsoft/Gemini to the provider list. - Updated `CATALOG.md` and `skills_index.json` with the new skills. 🧪 Verification - Ran `scripts/tests/test_comprehensive_coverage.py` to verify all Microsoft skills are detected. - Validated `generate_index.py` fixes by successfully indexing the new skills.
This commit is contained in:
219
skills/official/microsoft/python/data/blob/SKILL.md
Normal file
219
skills/official/microsoft/python/data/blob/SKILL.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
name: azure-storage-blob-py
|
||||
description: |
|
||||
Azure Blob Storage SDK for Python. Use for uploading, downloading, listing blobs, managing containers, and blob lifecycle.
|
||||
Triggers: "blob storage", "BlobServiceClient", "ContainerClient", "BlobClient", "upload blob", "download blob".
|
||||
package: azure-storage-blob
|
||||
---
|
||||
|
||||
# Azure Blob Storage SDK for Python
|
||||
|
||||
Client library for Azure Blob Storage — object storage for unstructured data.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-storage-blob azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_STORAGE_ACCOUNT_NAME=<your-storage-account>
|
||||
# Or use full URL
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.blob.core.windows.net
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.storage.blob import BlobServiceClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
account_url = "https://<account>.blob.core.windows.net"
|
||||
|
||||
blob_service_client = BlobServiceClient(account_url, credential=credential)
|
||||
```
|
||||
|
||||
## Client Hierarchy
|
||||
|
||||
| Client | Purpose | Get From |
|
||||
|--------|---------|----------|
|
||||
| `BlobServiceClient` | Account-level operations | Direct instantiation |
|
||||
| `ContainerClient` | Container operations | `blob_service_client.get_container_client()` |
|
||||
| `BlobClient` | Single blob operations | `container_client.get_blob_client()` |
|
||||
|
||||
## Core Workflow
|
||||
|
||||
### Create Container
|
||||
|
||||
```python
|
||||
container_client = blob_service_client.get_container_client("mycontainer")
|
||||
container_client.create_container()
|
||||
```
|
||||
|
||||
### Upload Blob
|
||||
|
||||
```python
|
||||
# From file path
|
||||
blob_client = blob_service_client.get_blob_client(
|
||||
container="mycontainer",
|
||||
blob="sample.txt"
|
||||
)
|
||||
|
||||
with open("./local-file.txt", "rb") as data:
|
||||
blob_client.upload_blob(data, overwrite=True)
|
||||
|
||||
# From bytes/string
|
||||
blob_client.upload_blob(b"Hello, World!", overwrite=True)
|
||||
|
||||
# From stream
|
||||
import io
|
||||
stream = io.BytesIO(b"Stream content")
|
||||
blob_client.upload_blob(stream, overwrite=True)
|
||||
```
|
||||
|
||||
### Download Blob
|
||||
|
||||
```python
|
||||
blob_client = blob_service_client.get_blob_client(
|
||||
container="mycontainer",
|
||||
blob="sample.txt"
|
||||
)
|
||||
|
||||
# To file
|
||||
with open("./downloaded.txt", "wb") as file:
|
||||
download_stream = blob_client.download_blob()
|
||||
file.write(download_stream.readall())
|
||||
|
||||
# To memory
|
||||
download_stream = blob_client.download_blob()
|
||||
content = download_stream.readall() # bytes
|
||||
|
||||
# Read into existing buffer
|
||||
stream = io.BytesIO()
|
||||
num_bytes = blob_client.download_blob().readinto(stream)
|
||||
```
|
||||
|
||||
### List Blobs
|
||||
|
||||
```python
|
||||
container_client = blob_service_client.get_container_client("mycontainer")
|
||||
|
||||
# List all blobs
|
||||
for blob in container_client.list_blobs():
|
||||
print(f"{blob.name} - {blob.size} bytes")
|
||||
|
||||
# List with prefix (folder-like)
|
||||
for blob in container_client.list_blobs(name_starts_with="logs/"):
|
||||
print(blob.name)
|
||||
|
||||
# Walk blob hierarchy (virtual directories)
|
||||
for item in container_client.walk_blobs(delimiter="/"):
|
||||
if item.get("prefix"):
|
||||
print(f"Directory: {item['prefix']}")
|
||||
else:
|
||||
print(f"Blob: {item.name}")
|
||||
```
|
||||
|
||||
### Delete Blob
|
||||
|
||||
```python
|
||||
blob_client.delete_blob()
|
||||
|
||||
# Delete with snapshots
|
||||
blob_client.delete_blob(delete_snapshots="include")
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
```python
|
||||
# Configure chunk sizes for large uploads/downloads
|
||||
blob_client = BlobClient(
|
||||
account_url=account_url,
|
||||
container_name="mycontainer",
|
||||
blob_name="large-file.zip",
|
||||
credential=credential,
|
||||
max_block_size=4 * 1024 * 1024, # 4 MiB blocks
|
||||
max_single_put_size=64 * 1024 * 1024 # 64 MiB single upload limit
|
||||
)
|
||||
|
||||
# Parallel upload
|
||||
blob_client.upload_blob(data, max_concurrency=4)
|
||||
|
||||
# Parallel download
|
||||
download_stream = blob_client.download_blob(max_concurrency=4)
|
||||
```
|
||||
|
||||
## SAS Tokens
|
||||
|
||||
```python
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
|
||||
|
||||
sas_token = generate_blob_sas(
|
||||
account_name="<account>",
|
||||
container_name="mycontainer",
|
||||
blob_name="sample.txt",
|
||||
account_key="<account-key>", # Or use user delegation key
|
||||
permission=BlobSasPermissions(read=True),
|
||||
expiry=datetime.now(timezone.utc) + timedelta(hours=1)
|
||||
)
|
||||
|
||||
# Use SAS token
|
||||
blob_url = f"https://<account>.blob.core.windows.net/mycontainer/sample.txt?{sas_token}"
|
||||
```
|
||||
|
||||
## Blob Properties and Metadata
|
||||
|
||||
```python
|
||||
# Get properties
|
||||
properties = blob_client.get_blob_properties()
|
||||
print(f"Size: {properties.size}")
|
||||
print(f"Content-Type: {properties.content_settings.content_type}")
|
||||
print(f"Last modified: {properties.last_modified}")
|
||||
|
||||
# Set metadata
|
||||
blob_client.set_blob_metadata(metadata={"category": "logs", "year": "2024"})
|
||||
|
||||
# Set content type
|
||||
from azure.storage.blob import ContentSettings
|
||||
blob_client.set_http_headers(
|
||||
content_settings=ContentSettings(content_type="application/json")
|
||||
)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
from azure.storage.blob.aio import BlobServiceClient
|
||||
|
||||
async def upload_async():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with BlobServiceClient(account_url, credential=credential) as client:
|
||||
blob_client = client.get_blob_client("mycontainer", "sample.txt")
|
||||
|
||||
with open("./file.txt", "rb") as data:
|
||||
await blob_client.upload_blob(data, overwrite=True)
|
||||
|
||||
# Download async
|
||||
async def download_async():
|
||||
async with BlobServiceClient(account_url, credential=credential) as client:
|
||||
blob_client = client.get_blob_client("mycontainer", "sample.txt")
|
||||
|
||||
stream = await blob_client.download_blob()
|
||||
data = await stream.readall()
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use DefaultAzureCredential** instead of connection strings
|
||||
2. **Use context managers** for async clients
|
||||
3. **Set `overwrite=True`** explicitly when re-uploading
|
||||
4. **Use `max_concurrency`** for large file transfers
|
||||
5. **Prefer `readinto()`** over `readall()` for memory efficiency
|
||||
6. **Use `walk_blobs()`** for hierarchical listing
|
||||
7. **Set appropriate content types** for web-served blobs
|
||||
239
skills/official/microsoft/python/data/cosmos-db/SKILL.md
Normal file
239
skills/official/microsoft/python/data/cosmos-db/SKILL.md
Normal file
@@ -0,0 +1,239 @@
|
||||
---
|
||||
name: azure-cosmos-db-py
|
||||
description: Build Azure Cosmos DB NoSQL services with Python/FastAPI following production-grade patterns. Use when implementing database client setup with dual auth (DefaultAzureCredential + emulator), service layer classes with CRUD operations, partition key strategies, parameterized queries, or TDD patterns for Cosmos. Triggers on phrases like "Cosmos DB", "NoSQL database", "document store", "add persistence", "database service layer", or "Python Cosmos SDK".
|
||||
package: azure-cosmos
|
||||
---
|
||||
|
||||
# Cosmos DB Service Implementation
|
||||
|
||||
Build production-grade Azure Cosmos DB NoSQL services following clean code, security best practices, and TDD principles.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-cosmos azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
COSMOS_ENDPOINT=https://<account>.documents.azure.com:443/
|
||||
COSMOS_DATABASE_NAME=<database-name>
|
||||
COSMOS_CONTAINER_ID=<container-id>
|
||||
# For emulator only (not production)
|
||||
COSMOS_KEY=<emulator-key>
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
**DefaultAzureCredential (preferred)**:
|
||||
```python
|
||||
from azure.cosmos import CosmosClient
|
||||
from azure.identity import DefaultAzureCredential
|
||||
|
||||
client = CosmosClient(
|
||||
url=os.environ["COSMOS_ENDPOINT"],
|
||||
credential=DefaultAzureCredential()
|
||||
)
|
||||
```
|
||||
|
||||
**Emulator (local development)**:
|
||||
```python
|
||||
from azure.cosmos import CosmosClient
|
||||
|
||||
client = CosmosClient(
|
||||
url="https://localhost:8081",
|
||||
credential=os.environ["COSMOS_KEY"],
|
||||
connection_verify=False
|
||||
)
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FastAPI Router │
|
||||
│ - Auth dependencies (get_current_user, get_current_user_required)
|
||||
│ - HTTP error responses (HTTPException) │
|
||||
└──────────────────────────────┬──────────────────────────────────┘
|
||||
│
|
||||
┌──────────────────────────────▼──────────────────────────────────┐
|
||||
│ Service Layer │
|
||||
│ - Business logic and validation │
|
||||
│ - Document ↔ Model conversion │
|
||||
│ - Graceful degradation when Cosmos unavailable │
|
||||
└──────────────────────────────┬──────────────────────────────────┘
|
||||
│
|
||||
┌──────────────────────────────▼──────────────────────────────────┐
|
||||
│ Cosmos DB Client Module │
|
||||
│ - Singleton container initialization │
|
||||
│ - Dual auth: DefaultAzureCredential (Azure) / Key (emulator) │
|
||||
│ - Async wrapper via run_in_threadpool │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Client Module Setup
|
||||
|
||||
Create a singleton Cosmos client with dual authentication:
|
||||
|
||||
```python
|
||||
# db/cosmos.py
|
||||
from azure.cosmos import CosmosClient
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from starlette.concurrency import run_in_threadpool
|
||||
|
||||
_cosmos_container = None
|
||||
|
||||
def _is_emulator_endpoint(endpoint: str) -> bool:
|
||||
return "localhost" in endpoint or "127.0.0.1" in endpoint
|
||||
|
||||
async def get_container():
|
||||
global _cosmos_container
|
||||
if _cosmos_container is None:
|
||||
if _is_emulator_endpoint(settings.cosmos_endpoint):
|
||||
client = CosmosClient(
|
||||
url=settings.cosmos_endpoint,
|
||||
credential=settings.cosmos_key,
|
||||
connection_verify=False
|
||||
)
|
||||
else:
|
||||
client = CosmosClient(
|
||||
url=settings.cosmos_endpoint,
|
||||
credential=DefaultAzureCredential()
|
||||
)
|
||||
db = client.get_database_client(settings.cosmos_database_name)
|
||||
_cosmos_container = db.get_container_client(settings.cosmos_container_id)
|
||||
return _cosmos_container
|
||||
```
|
||||
|
||||
**Full implementation**: See [references/client-setup.md](references/client-setup.md)
|
||||
|
||||
### 2. Pydantic Model Hierarchy
|
||||
|
||||
Use five-tier model pattern for clean separation:
|
||||
|
||||
```python
|
||||
class ProjectBase(BaseModel): # Shared fields
|
||||
name: str = Field(..., min_length=1, max_length=200)
|
||||
|
||||
class ProjectCreate(ProjectBase): # Creation request
|
||||
workspace_id: str = Field(..., alias="workspaceId")
|
||||
|
||||
class ProjectUpdate(BaseModel): # Partial updates (all optional)
|
||||
name: Optional[str] = Field(None, min_length=1)
|
||||
|
||||
class Project(ProjectBase): # API response
|
||||
id: str
|
||||
created_at: datetime = Field(..., alias="createdAt")
|
||||
|
||||
class ProjectInDB(Project): # Internal with docType
|
||||
doc_type: str = "project"
|
||||
```
|
||||
|
||||
### 3. Service Layer Pattern
|
||||
|
||||
```python
|
||||
class ProjectService:
|
||||
def _use_cosmos(self) -> bool:
|
||||
return get_container() is not None
|
||||
|
||||
async def get_by_id(self, project_id: str, workspace_id: str) -> Project | None:
|
||||
if not self._use_cosmos():
|
||||
return None
|
||||
doc = await get_document(project_id, partition_key=workspace_id)
|
||||
if doc is None:
|
||||
return None
|
||||
return self._doc_to_model(doc)
|
||||
```
|
||||
|
||||
**Full patterns**: See [references/service-layer.md](references/service-layer.md)
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Security Requirements
|
||||
|
||||
1. **RBAC Authentication**: Use `DefaultAzureCredential` in Azure — never store keys in code
|
||||
2. **Emulator-Only Keys**: Hardcode the well-known emulator key only for local development
|
||||
3. **Parameterized Queries**: Always use `@parameter` syntax — never string concatenation
|
||||
4. **Partition Key Validation**: Validate partition key access matches user authorization
|
||||
|
||||
### Clean Code Conventions
|
||||
|
||||
1. **Single Responsibility**: Client module handles connection; services handle business logic
|
||||
2. **Graceful Degradation**: Services return `None`/`[]` when Cosmos unavailable
|
||||
3. **Consistent Naming**: `_doc_to_model()`, `_model_to_doc()`, `_use_cosmos()`
|
||||
4. **Type Hints**: Full typing on all public methods
|
||||
5. **CamelCase Aliases**: Use `Field(alias="camelCase")` for JSON serialization
|
||||
|
||||
### TDD Requirements
|
||||
|
||||
Write tests BEFORE implementation using these patterns:
|
||||
|
||||
```python
|
||||
@pytest.fixture
|
||||
def mock_cosmos_container(mocker):
|
||||
container = mocker.MagicMock()
|
||||
mocker.patch("app.db.cosmos.get_container", return_value=container)
|
||||
return container
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_project_by_id_returns_project(mock_cosmos_container):
|
||||
# Arrange
|
||||
mock_cosmos_container.read_item.return_value = {"id": "123", "name": "Test"}
|
||||
|
||||
# Act
|
||||
result = await project_service.get_by_id("123", "workspace-1")
|
||||
|
||||
# Assert
|
||||
assert result.id == "123"
|
||||
assert result.name == "Test"
|
||||
```
|
||||
|
||||
**Full testing guide**: See [references/testing.md](references/testing.md)
|
||||
|
||||
## Reference Files
|
||||
|
||||
| File | When to Read |
|
||||
|------|--------------|
|
||||
| [references/client-setup.md](references/client-setup.md) | Setting up Cosmos client with dual auth, SSL config, singleton pattern |
|
||||
| [references/service-layer.md](references/service-layer.md) | Implementing full service class with CRUD, conversions, graceful degradation |
|
||||
| [references/testing.md](references/testing.md) | Writing pytest tests, mocking Cosmos, integration test setup |
|
||||
| [references/partitioning.md](references/partitioning.md) | Choosing partition keys, cross-partition queries, move operations |
|
||||
| [references/error-handling.md](references/error-handling.md) | Handling CosmosResourceNotFoundError, logging, HTTP error mapping |
|
||||
|
||||
## Template Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| [assets/cosmos_client_template.py](assets/cosmos_client_template.py) | Ready-to-use client module |
|
||||
| [assets/service_template.py](assets/service_template.py) | Service class skeleton |
|
||||
| [assets/conftest_template.py](assets/conftest_template.py) | pytest fixtures for Cosmos mocking |
|
||||
|
||||
## Quality Attributes (NFRs)
|
||||
|
||||
### Reliability
|
||||
- Graceful degradation when Cosmos unavailable
|
||||
- Retry logic with exponential backoff for transient failures
|
||||
- Connection pooling via singleton pattern
|
||||
|
||||
### Security
|
||||
- Zero secrets in code (RBAC via DefaultAzureCredential)
|
||||
- Parameterized queries prevent injection
|
||||
- Partition key isolation enforces data boundaries
|
||||
|
||||
### Maintainability
|
||||
- Five-tier model pattern enables schema evolution
|
||||
- Service layer decouples business logic from storage
|
||||
- Consistent patterns across all entity services
|
||||
|
||||
### Testability
|
||||
- Dependency injection via `get_container()`
|
||||
- Easy mocking with module-level globals
|
||||
- Clear separation enables unit testing without Cosmos
|
||||
|
||||
### Performance
|
||||
- Partition key queries avoid cross-partition scans
|
||||
- Async wrapping prevents blocking FastAPI event loop
|
||||
- Minimal document conversion overhead
|
||||
280
skills/official/microsoft/python/data/cosmos/SKILL.md
Normal file
280
skills/official/microsoft/python/data/cosmos/SKILL.md
Normal file
@@ -0,0 +1,280 @@
|
||||
---
|
||||
name: azure-cosmos-py
|
||||
description: |
|
||||
Azure Cosmos DB SDK for Python (NoSQL API). Use for document CRUD, queries, containers, and globally distributed data.
|
||||
Triggers: "cosmos db", "CosmosClient", "container", "document", "NoSQL", "partition key".
|
||||
package: azure-cosmos
|
||||
---
|
||||
|
||||
# Azure Cosmos DB SDK for Python
|
||||
|
||||
Client library for Azure Cosmos DB NoSQL API — globally distributed, multi-model database.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-cosmos azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
COSMOS_ENDPOINT=https://<account>.documents.azure.com:443/
|
||||
COSMOS_DATABASE=mydb
|
||||
COSMOS_CONTAINER=mycontainer
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.cosmos import CosmosClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
endpoint = "https://<account>.documents.azure.com:443/"
|
||||
|
||||
client = CosmosClient(url=endpoint, credential=credential)
|
||||
```
|
||||
|
||||
## Client Hierarchy
|
||||
|
||||
| Client | Purpose | Get From |
|
||||
|--------|---------|----------|
|
||||
| `CosmosClient` | Account-level operations | Direct instantiation |
|
||||
| `DatabaseProxy` | Database operations | `client.get_database_client()` |
|
||||
| `ContainerProxy` | Container/item operations | `database.get_container_client()` |
|
||||
|
||||
## Core Workflow
|
||||
|
||||
### Setup Database and Container
|
||||
|
||||
```python
|
||||
# Get or create database
|
||||
database = client.create_database_if_not_exists(id="mydb")
|
||||
|
||||
# Get or create container with partition key
|
||||
container = database.create_container_if_not_exists(
|
||||
id="mycontainer",
|
||||
partition_key=PartitionKey(path="/category")
|
||||
)
|
||||
|
||||
# Get existing
|
||||
database = client.get_database_client("mydb")
|
||||
container = database.get_container_client("mycontainer")
|
||||
```
|
||||
|
||||
### Create Item
|
||||
|
||||
```python
|
||||
item = {
|
||||
"id": "item-001", # Required: unique within partition
|
||||
"category": "electronics", # Partition key value
|
||||
"name": "Laptop",
|
||||
"price": 999.99,
|
||||
"tags": ["computer", "portable"]
|
||||
}
|
||||
|
||||
created = container.create_item(body=item)
|
||||
print(f"Created: {created['id']}")
|
||||
```
|
||||
|
||||
### Read Item
|
||||
|
||||
```python
|
||||
# Read requires id AND partition key
|
||||
item = container.read_item(
|
||||
item="item-001",
|
||||
partition_key="electronics"
|
||||
)
|
||||
print(f"Name: {item['name']}")
|
||||
```
|
||||
|
||||
### Update Item (Replace)
|
||||
|
||||
```python
|
||||
item = container.read_item(item="item-001", partition_key="electronics")
|
||||
item["price"] = 899.99
|
||||
item["on_sale"] = True
|
||||
|
||||
updated = container.replace_item(item=item["id"], body=item)
|
||||
```
|
||||
|
||||
### Upsert Item
|
||||
|
||||
```python
|
||||
# Create if not exists, replace if exists
|
||||
item = {
|
||||
"id": "item-002",
|
||||
"category": "electronics",
|
||||
"name": "Tablet",
|
||||
"price": 499.99
|
||||
}
|
||||
|
||||
result = container.upsert_item(body=item)
|
||||
```
|
||||
|
||||
### Delete Item
|
||||
|
||||
```python
|
||||
container.delete_item(
|
||||
item="item-001",
|
||||
partition_key="electronics"
|
||||
)
|
||||
```
|
||||
|
||||
## Queries
|
||||
|
||||
### Basic Query
|
||||
|
||||
```python
|
||||
# Query within a partition (efficient)
|
||||
query = "SELECT * FROM c WHERE c.price < @max_price"
|
||||
items = container.query_items(
|
||||
query=query,
|
||||
parameters=[{"name": "@max_price", "value": 500}],
|
||||
partition_key="electronics"
|
||||
)
|
||||
|
||||
for item in items:
|
||||
print(f"{item['name']}: ${item['price']}")
|
||||
```
|
||||
|
||||
### Cross-Partition Query
|
||||
|
||||
```python
|
||||
# Cross-partition (more expensive, use sparingly)
|
||||
query = "SELECT * FROM c WHERE c.price < @max_price"
|
||||
items = container.query_items(
|
||||
query=query,
|
||||
parameters=[{"name": "@max_price", "value": 500}],
|
||||
enable_cross_partition_query=True
|
||||
)
|
||||
|
||||
for item in items:
|
||||
print(item)
|
||||
```
|
||||
|
||||
### Query with Projection
|
||||
|
||||
```python
|
||||
query = "SELECT c.id, c.name, c.price FROM c WHERE c.category = @category"
|
||||
items = container.query_items(
|
||||
query=query,
|
||||
parameters=[{"name": "@category", "value": "electronics"}],
|
||||
partition_key="electronics"
|
||||
)
|
||||
```
|
||||
|
||||
### Read All Items
|
||||
|
||||
```python
|
||||
# Read all in a partition
|
||||
items = container.read_all_items() # Cross-partition
|
||||
# Or with partition key
|
||||
items = container.query_items(
|
||||
query="SELECT * FROM c",
|
||||
partition_key="electronics"
|
||||
)
|
||||
```
|
||||
|
||||
## Partition Keys
|
||||
|
||||
**Critical**: Always include partition key for efficient operations.
|
||||
|
||||
```python
|
||||
from azure.cosmos import PartitionKey
|
||||
|
||||
# Single partition key
|
||||
container = database.create_container_if_not_exists(
|
||||
id="orders",
|
||||
partition_key=PartitionKey(path="/customer_id")
|
||||
)
|
||||
|
||||
# Hierarchical partition key (preview)
|
||||
container = database.create_container_if_not_exists(
|
||||
id="events",
|
||||
partition_key=PartitionKey(path=["/tenant_id", "/user_id"])
|
||||
)
|
||||
```
|
||||
|
||||
## Throughput
|
||||
|
||||
```python
|
||||
# Create container with provisioned throughput
|
||||
container = database.create_container_if_not_exists(
|
||||
id="mycontainer",
|
||||
partition_key=PartitionKey(path="/pk"),
|
||||
offer_throughput=400 # RU/s
|
||||
)
|
||||
|
||||
# Read current throughput
|
||||
offer = container.read_offer()
|
||||
print(f"Throughput: {offer.offer_throughput} RU/s")
|
||||
|
||||
# Update throughput
|
||||
container.replace_throughput(throughput=1000)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.cosmos.aio import CosmosClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def cosmos_operations():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with CosmosClient(endpoint, credential=credential) as client:
|
||||
database = client.get_database_client("mydb")
|
||||
container = database.get_container_client("mycontainer")
|
||||
|
||||
# Create
|
||||
await container.create_item(body={"id": "1", "pk": "test"})
|
||||
|
||||
# Read
|
||||
item = await container.read_item(item="1", partition_key="test")
|
||||
|
||||
# Query
|
||||
async for item in container.query_items(
|
||||
query="SELECT * FROM c",
|
||||
partition_key="test"
|
||||
):
|
||||
print(item)
|
||||
|
||||
import asyncio
|
||||
asyncio.run(cosmos_operations())
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from azure.cosmos.exceptions import CosmosHttpResponseError
|
||||
|
||||
try:
|
||||
item = container.read_item(item="nonexistent", partition_key="pk")
|
||||
except CosmosHttpResponseError as e:
|
||||
if e.status_code == 404:
|
||||
print("Item not found")
|
||||
elif e.status_code == 429:
|
||||
print(f"Rate limited. Retry after: {e.headers.get('x-ms-retry-after-ms')}ms")
|
||||
else:
|
||||
raise
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always specify partition key** for point reads and queries
|
||||
2. **Use parameterized queries** to prevent injection and improve caching
|
||||
3. **Avoid cross-partition queries** when possible
|
||||
4. **Use `upsert_item`** for idempotent writes
|
||||
5. **Use async client** for high-throughput scenarios
|
||||
6. **Design partition key** for even data distribution
|
||||
7. **Use `read_item`** instead of query for single document retrieval
|
||||
|
||||
## Reference Files
|
||||
|
||||
| File | Contents |
|
||||
|------|----------|
|
||||
| [references/partitioning.md](references/partitioning.md) | Partition key strategies, hierarchical keys, hot partition detection and mitigation |
|
||||
| [references/query-patterns.md](references/query-patterns.md) | Query optimization, aggregations, pagination, transactions, change feed |
|
||||
| [scripts/setup_cosmos_container.py](scripts/setup_cosmos_container.py) | CLI tool for creating containers with partitioning, throughput, and indexing |
|
||||
211
skills/official/microsoft/python/data/datalake/SKILL.md
Normal file
211
skills/official/microsoft/python/data/datalake/SKILL.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
name: azure-storage-file-datalake-py
|
||||
description: |
|
||||
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
|
||||
Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".
|
||||
package: azure-storage-file-datalake
|
||||
---
|
||||
|
||||
# Azure Data Lake Storage Gen2 SDK for Python
|
||||
|
||||
Hierarchical file system for big data analytics workloads.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-storage-file-datalake azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.storage.filedatalake import DataLakeServiceClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
account_url = "https://<account>.dfs.core.windows.net"
|
||||
|
||||
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
|
||||
```
|
||||
|
||||
## Client Hierarchy
|
||||
|
||||
| Client | Purpose |
|
||||
|--------|---------|
|
||||
| `DataLakeServiceClient` | Account-level operations |
|
||||
| `FileSystemClient` | Container (file system) operations |
|
||||
| `DataLakeDirectoryClient` | Directory operations |
|
||||
| `DataLakeFileClient` | File operations |
|
||||
|
||||
## File System Operations
|
||||
|
||||
```python
|
||||
# Create file system (container)
|
||||
file_system_client = service_client.create_file_system("myfilesystem")
|
||||
|
||||
# Get existing
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
|
||||
# Delete
|
||||
service_client.delete_file_system("myfilesystem")
|
||||
|
||||
# List file systems
|
||||
for fs in service_client.list_file_systems():
|
||||
print(fs.name)
|
||||
```
|
||||
|
||||
## Directory Operations
|
||||
|
||||
```python
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
|
||||
# Create directory
|
||||
directory_client = file_system_client.create_directory("mydir")
|
||||
|
||||
# Create nested directories
|
||||
directory_client = file_system_client.create_directory("path/to/nested/dir")
|
||||
|
||||
# Get directory client
|
||||
directory_client = file_system_client.get_directory_client("mydir")
|
||||
|
||||
# Delete directory
|
||||
directory_client.delete_directory()
|
||||
|
||||
# Rename/move directory
|
||||
directory_client.rename_directory(new_name="myfilesystem/newname")
|
||||
```
|
||||
|
||||
## File Operations
|
||||
|
||||
### Upload File
|
||||
|
||||
```python
|
||||
# Get file client
|
||||
file_client = file_system_client.get_file_client("path/to/file.txt")
|
||||
|
||||
# Upload from local file
|
||||
with open("local-file.txt", "rb") as data:
|
||||
file_client.upload_data(data, overwrite=True)
|
||||
|
||||
# Upload bytes
|
||||
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
|
||||
|
||||
# Append data (for large files)
|
||||
file_client.append_data(data=b"chunk1", offset=0, length=6)
|
||||
file_client.append_data(data=b"chunk2", offset=6, length=6)
|
||||
file_client.flush_data(12) # Commit the data
|
||||
```
|
||||
|
||||
### Download File
|
||||
|
||||
```python
|
||||
file_client = file_system_client.get_file_client("path/to/file.txt")
|
||||
|
||||
# Download all content
|
||||
download = file_client.download_file()
|
||||
content = download.readall()
|
||||
|
||||
# Download to file
|
||||
with open("downloaded.txt", "wb") as f:
|
||||
download = file_client.download_file()
|
||||
download.readinto(f)
|
||||
|
||||
# Download range
|
||||
download = file_client.download_file(offset=0, length=100)
|
||||
```
|
||||
|
||||
### Delete File
|
||||
|
||||
```python
|
||||
file_client.delete_file()
|
||||
```
|
||||
|
||||
## List Contents
|
||||
|
||||
```python
|
||||
# List paths (files and directories)
|
||||
for path in file_system_client.get_paths():
|
||||
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
|
||||
|
||||
# List paths in directory
|
||||
for path in file_system_client.get_paths(path="mydir"):
|
||||
print(path.name)
|
||||
|
||||
# Recursive listing
|
||||
for path in file_system_client.get_paths(path="mydir", recursive=True):
|
||||
print(path.name)
|
||||
```
|
||||
|
||||
## File/Directory Properties
|
||||
|
||||
```python
|
||||
# Get properties
|
||||
properties = file_client.get_file_properties()
|
||||
print(f"Size: {properties.size}")
|
||||
print(f"Last modified: {properties.last_modified}")
|
||||
|
||||
# Set metadata
|
||||
file_client.set_metadata(metadata={"processed": "true"})
|
||||
```
|
||||
|
||||
## Access Control (ACL)
|
||||
|
||||
```python
|
||||
# Get ACL
|
||||
acl = directory_client.get_access_control()
|
||||
print(f"Owner: {acl['owner']}")
|
||||
print(f"Permissions: {acl['permissions']}")
|
||||
|
||||
# Set ACL
|
||||
directory_client.set_access_control(
|
||||
owner="user-id",
|
||||
permissions="rwxr-x---"
|
||||
)
|
||||
|
||||
# Update ACL entries
|
||||
from azure.storage.filedatalake import AccessControlChangeResult
|
||||
directory_client.update_access_control_recursive(
|
||||
acl="user:user-id:rwx"
|
||||
)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.storage.filedatalake.aio import DataLakeServiceClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def datalake_operations():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with DataLakeServiceClient(
|
||||
account_url="https://<account>.dfs.core.windows.net",
|
||||
credential=credential
|
||||
) as service_client:
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
file_client = file_system_client.get_file_client("test.txt")
|
||||
|
||||
await file_client.upload_data(b"async content", overwrite=True)
|
||||
|
||||
download = await file_client.download_file()
|
||||
content = await download.readall()
|
||||
|
||||
import asyncio
|
||||
asyncio.run(datalake_operations())
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use hierarchical namespace** for file system semantics
|
||||
2. **Use `append_data` + `flush_data`** for large file uploads
|
||||
3. **Set ACLs at directory level** and inherit to children
|
||||
4. **Use async client** for high-throughput scenarios
|
||||
5. **Use `get_paths` with `recursive=True`** for full directory listing
|
||||
6. **Set metadata** for custom file attributes
|
||||
7. **Consider Blob API** for simple object storage use cases
|
||||
238
skills/official/microsoft/python/data/fileshare/SKILL.md
Normal file
238
skills/official/microsoft/python/data/fileshare/SKILL.md
Normal file
@@ -0,0 +1,238 @@
|
||||
---
|
||||
name: azure-storage-file-share-py
|
||||
description: |
|
||||
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.
|
||||
Triggers: "azure-storage-file-share", "ShareServiceClient", "ShareClient", "file share", "SMB".
|
||||
---
|
||||
|
||||
# Azure Storage File Share SDK for Python
|
||||
|
||||
Manage SMB file shares for cloud-native and lift-and-shift scenarios.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-storage-file-share
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...
|
||||
# Or
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.file.core.windows.net
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
### Connection String
|
||||
|
||||
```python
|
||||
from azure.storage.fileshare import ShareServiceClient
|
||||
|
||||
service = ShareServiceClient.from_connection_string(
|
||||
os.environ["AZURE_STORAGE_CONNECTION_STRING"]
|
||||
)
|
||||
```
|
||||
|
||||
### Entra ID
|
||||
|
||||
```python
|
||||
from azure.storage.fileshare import ShareServiceClient
|
||||
from azure.identity import DefaultAzureCredential
|
||||
|
||||
service = ShareServiceClient(
|
||||
account_url=os.environ["AZURE_STORAGE_ACCOUNT_URL"],
|
||||
credential=DefaultAzureCredential()
|
||||
)
|
||||
```
|
||||
|
||||
## Share Operations
|
||||
|
||||
### Create Share
|
||||
|
||||
```python
|
||||
share = service.create_share("my-share")
|
||||
```
|
||||
|
||||
### List Shares
|
||||
|
||||
```python
|
||||
for share in service.list_shares():
|
||||
print(f"{share.name}: {share.quota} GB")
|
||||
```
|
||||
|
||||
### Get Share Client
|
||||
|
||||
```python
|
||||
share_client = service.get_share_client("my-share")
|
||||
```
|
||||
|
||||
### Delete Share
|
||||
|
||||
```python
|
||||
service.delete_share("my-share")
|
||||
```
|
||||
|
||||
## Directory Operations
|
||||
|
||||
### Create Directory
|
||||
|
||||
```python
|
||||
share_client = service.get_share_client("my-share")
|
||||
share_client.create_directory("my-directory")
|
||||
|
||||
# Nested directory
|
||||
share_client.create_directory("my-directory/sub-directory")
|
||||
```
|
||||
|
||||
### List Directories and Files
|
||||
|
||||
```python
|
||||
directory_client = share_client.get_directory_client("my-directory")
|
||||
|
||||
for item in directory_client.list_directories_and_files():
|
||||
if item["is_directory"]:
|
||||
print(f"[DIR] {item['name']}")
|
||||
else:
|
||||
print(f"[FILE] {item['name']} ({item['size']} bytes)")
|
||||
```
|
||||
|
||||
### Delete Directory
|
||||
|
||||
```python
|
||||
share_client.delete_directory("my-directory")
|
||||
```
|
||||
|
||||
## File Operations
|
||||
|
||||
### Upload File
|
||||
|
||||
```python
|
||||
file_client = share_client.get_file_client("my-directory/file.txt")
|
||||
|
||||
# From string
|
||||
file_client.upload_file("Hello, World!")
|
||||
|
||||
# From file
|
||||
with open("local-file.txt", "rb") as f:
|
||||
file_client.upload_file(f)
|
||||
|
||||
# From bytes
|
||||
file_client.upload_file(b"Binary content")
|
||||
```
|
||||
|
||||
### Download File
|
||||
|
||||
```python
|
||||
file_client = share_client.get_file_client("my-directory/file.txt")
|
||||
|
||||
# To bytes
|
||||
data = file_client.download_file().readall()
|
||||
|
||||
# To file
|
||||
with open("downloaded.txt", "wb") as f:
|
||||
data = file_client.download_file()
|
||||
data.readinto(f)
|
||||
|
||||
# Stream chunks
|
||||
download = file_client.download_file()
|
||||
for chunk in download.chunks():
|
||||
process(chunk)
|
||||
```
|
||||
|
||||
### Get File Properties
|
||||
|
||||
```python
|
||||
properties = file_client.get_file_properties()
|
||||
print(f"Size: {properties.size}")
|
||||
print(f"Content type: {properties.content_settings.content_type}")
|
||||
print(f"Last modified: {properties.last_modified}")
|
||||
```
|
||||
|
||||
### Delete File
|
||||
|
||||
```python
|
||||
file_client.delete_file()
|
||||
```
|
||||
|
||||
### Copy File
|
||||
|
||||
```python
|
||||
source_url = "https://account.file.core.windows.net/share/source.txt"
|
||||
dest_client = share_client.get_file_client("destination.txt")
|
||||
dest_client.start_copy_from_url(source_url)
|
||||
```
|
||||
|
||||
## Range Operations
|
||||
|
||||
### Upload Range
|
||||
|
||||
```python
|
||||
# Upload to specific range
|
||||
file_client.upload_range(data=b"content", offset=0, length=7)
|
||||
```
|
||||
|
||||
### Download Range
|
||||
|
||||
```python
|
||||
# Download specific range
|
||||
download = file_client.download_file(offset=0, length=100)
|
||||
data = download.readall()
|
||||
```
|
||||
|
||||
## Snapshot Operations
|
||||
|
||||
### Create Snapshot
|
||||
|
||||
```python
|
||||
snapshot = share_client.create_snapshot()
|
||||
print(f"Snapshot: {snapshot['snapshot']}")
|
||||
```
|
||||
|
||||
### Access Snapshot
|
||||
|
||||
```python
|
||||
snapshot_client = service.get_share_client(
|
||||
"my-share",
|
||||
snapshot=snapshot["snapshot"]
|
||||
)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.storage.fileshare.aio import ShareServiceClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def upload_file():
|
||||
credential = DefaultAzureCredential()
|
||||
service = ShareServiceClient(account_url, credential=credential)
|
||||
|
||||
share = service.get_share_client("my-share")
|
||||
file_client = share.get_file_client("test.txt")
|
||||
|
||||
await file_client.upload_file("Hello!")
|
||||
|
||||
await service.close()
|
||||
await credential.close()
|
||||
```
|
||||
|
||||
## Client Types
|
||||
|
||||
| Client | Purpose |
|
||||
|--------|---------|
|
||||
| `ShareServiceClient` | Account-level operations |
|
||||
| `ShareClient` | Share operations |
|
||||
| `ShareDirectoryClient` | Directory operations |
|
||||
| `ShareFileClient` | File operations |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use connection string** for simplest setup
|
||||
2. **Use Entra ID** for production with RBAC
|
||||
3. **Stream large files** using chunks() to avoid memory issues
|
||||
4. **Create snapshots** before major changes
|
||||
5. **Set quotas** to prevent unexpected storage costs
|
||||
6. **Use ranges** for partial file updates
|
||||
7. **Close async clients** explicitly
|
||||
213
skills/official/microsoft/python/data/queue/SKILL.md
Normal file
213
skills/official/microsoft/python/data/queue/SKILL.md
Normal file
@@ -0,0 +1,213 @@
|
||||
---
|
||||
name: azure-storage-queue-py
|
||||
description: |
|
||||
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.
|
||||
Triggers: "queue storage", "QueueServiceClient", "QueueClient", "message queue", "dequeue".
|
||||
package: azure-storage-queue
|
||||
---
|
||||
|
||||
# Azure Queue Storage SDK for Python
|
||||
|
||||
Simple, cost-effective message queuing for asynchronous communication.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-storage-queue azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.queue.core.windows.net
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.storage.queue import QueueServiceClient, QueueClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
account_url = "https://<account>.queue.core.windows.net"
|
||||
|
||||
# Service client
|
||||
service_client = QueueServiceClient(account_url=account_url, credential=credential)
|
||||
|
||||
# Queue client
|
||||
queue_client = QueueClient(account_url=account_url, queue_name="myqueue", credential=credential)
|
||||
```
|
||||
|
||||
## Queue Operations
|
||||
|
||||
```python
|
||||
# Create queue
|
||||
service_client.create_queue("myqueue")
|
||||
|
||||
# Get queue client
|
||||
queue_client = service_client.get_queue_client("myqueue")
|
||||
|
||||
# Delete queue
|
||||
service_client.delete_queue("myqueue")
|
||||
|
||||
# List queues
|
||||
for queue in service_client.list_queues():
|
||||
print(queue.name)
|
||||
```
|
||||
|
||||
## Send Messages
|
||||
|
||||
```python
|
||||
# Send message (string)
|
||||
queue_client.send_message("Hello, Queue!")
|
||||
|
||||
# Send with options
|
||||
queue_client.send_message(
|
||||
content="Delayed message",
|
||||
visibility_timeout=60, # Hidden for 60 seconds
|
||||
time_to_live=3600 # Expires in 1 hour
|
||||
)
|
||||
|
||||
# Send JSON
|
||||
import json
|
||||
data = {"task": "process", "id": 123}
|
||||
queue_client.send_message(json.dumps(data))
|
||||
```
|
||||
|
||||
## Receive Messages
|
||||
|
||||
```python
|
||||
# Receive messages (makes them invisible temporarily)
|
||||
messages = queue_client.receive_messages(
|
||||
messages_per_page=10,
|
||||
visibility_timeout=30 # 30 seconds to process
|
||||
)
|
||||
|
||||
for message in messages:
|
||||
print(f"ID: {message.id}")
|
||||
print(f"Content: {message.content}")
|
||||
print(f"Dequeue count: {message.dequeue_count}")
|
||||
|
||||
# Process message...
|
||||
|
||||
# Delete after processing
|
||||
queue_client.delete_message(message)
|
||||
```
|
||||
|
||||
## Peek Messages
|
||||
|
||||
```python
|
||||
# Peek without hiding (doesn't affect visibility)
|
||||
messages = queue_client.peek_messages(max_messages=5)
|
||||
|
||||
for message in messages:
|
||||
print(message.content)
|
||||
```
|
||||
|
||||
## Update Message
|
||||
|
||||
```python
|
||||
# Extend visibility or update content
|
||||
messages = queue_client.receive_messages()
|
||||
for message in messages:
|
||||
# Extend timeout (need more time)
|
||||
queue_client.update_message(
|
||||
message,
|
||||
visibility_timeout=60
|
||||
)
|
||||
|
||||
# Update content and timeout
|
||||
queue_client.update_message(
|
||||
message,
|
||||
content="Updated content",
|
||||
visibility_timeout=60
|
||||
)
|
||||
```
|
||||
|
||||
## Delete Message
|
||||
|
||||
```python
|
||||
# Delete after successful processing
|
||||
messages = queue_client.receive_messages()
|
||||
for message in messages:
|
||||
try:
|
||||
# Process...
|
||||
queue_client.delete_message(message)
|
||||
except Exception:
|
||||
# Message becomes visible again after timeout
|
||||
pass
|
||||
```
|
||||
|
||||
## Clear Queue
|
||||
|
||||
```python
|
||||
# Delete all messages
|
||||
queue_client.clear_messages()
|
||||
```
|
||||
|
||||
## Queue Properties
|
||||
|
||||
```python
|
||||
# Get queue properties
|
||||
properties = queue_client.get_queue_properties()
|
||||
print(f"Approximate message count: {properties.approximate_message_count}")
|
||||
|
||||
# Set/get metadata
|
||||
queue_client.set_queue_metadata(metadata={"environment": "production"})
|
||||
properties = queue_client.get_queue_properties()
|
||||
print(properties.metadata)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.storage.queue.aio import QueueServiceClient, QueueClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def queue_operations():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with QueueClient(
|
||||
account_url="https://<account>.queue.core.windows.net",
|
||||
queue_name="myqueue",
|
||||
credential=credential
|
||||
) as client:
|
||||
# Send
|
||||
await client.send_message("Async message")
|
||||
|
||||
# Receive
|
||||
async for message in client.receive_messages():
|
||||
print(message.content)
|
||||
await client.delete_message(message)
|
||||
|
||||
import asyncio
|
||||
asyncio.run(queue_operations())
|
||||
```
|
||||
|
||||
## Base64 Encoding
|
||||
|
||||
```python
|
||||
from azure.storage.queue import QueueClient, BinaryBase64EncodePolicy, BinaryBase64DecodePolicy
|
||||
|
||||
# For binary data
|
||||
queue_client = QueueClient(
|
||||
account_url=account_url,
|
||||
queue_name="myqueue",
|
||||
credential=credential,
|
||||
message_encode_policy=BinaryBase64EncodePolicy(),
|
||||
message_decode_policy=BinaryBase64DecodePolicy()
|
||||
)
|
||||
|
||||
# Send bytes
|
||||
queue_client.send_message(b"Binary content")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Delete messages after processing** to prevent reprocessing
|
||||
2. **Set appropriate visibility timeout** based on processing time
|
||||
3. **Handle `dequeue_count`** for poison message detection
|
||||
4. **Use async client** for high-throughput scenarios
|
||||
5. **Use `peek_messages`** for monitoring without affecting queue
|
||||
6. **Set `time_to_live`** to prevent stale messages
|
||||
7. **Consider Service Bus** for advanced features (sessions, topics)
|
||||
243
skills/official/microsoft/python/data/tables/SKILL.md
Normal file
243
skills/official/microsoft/python/data/tables/SKILL.md
Normal file
@@ -0,0 +1,243 @@
|
||||
---
|
||||
name: azure-data-tables-py
|
||||
description: |
|
||||
Azure Tables SDK for Python (Storage and Cosmos DB). Use for NoSQL key-value storage, entity CRUD, and batch operations.
|
||||
Triggers: "table storage", "TableServiceClient", "TableClient", "entities", "PartitionKey", "RowKey".
|
||||
package: azure-data-tables
|
||||
---
|
||||
|
||||
# Azure Tables SDK for Python
|
||||
|
||||
NoSQL key-value store for structured data (Azure Storage Tables or Cosmos DB Table API).
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-data-tables azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
# Azure Storage Tables
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.table.core.windows.net
|
||||
|
||||
# Cosmos DB Table API
|
||||
COSMOS_TABLE_ENDPOINT=https://<account>.table.cosmos.azure.com
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.data.tables import TableServiceClient, TableClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
endpoint = "https://<account>.table.core.windows.net"
|
||||
|
||||
# Service client (manage tables)
|
||||
service_client = TableServiceClient(endpoint=endpoint, credential=credential)
|
||||
|
||||
# Table client (work with entities)
|
||||
table_client = TableClient(endpoint=endpoint, table_name="mytable", credential=credential)
|
||||
```
|
||||
|
||||
## Client Types
|
||||
|
||||
| Client | Purpose |
|
||||
|--------|---------|
|
||||
| `TableServiceClient` | Create/delete tables, list tables |
|
||||
| `TableClient` | Entity CRUD, queries |
|
||||
|
||||
## Table Operations
|
||||
|
||||
```python
|
||||
# Create table
|
||||
service_client.create_table("mytable")
|
||||
|
||||
# Create if not exists
|
||||
service_client.create_table_if_not_exists("mytable")
|
||||
|
||||
# Delete table
|
||||
service_client.delete_table("mytable")
|
||||
|
||||
# List tables
|
||||
for table in service_client.list_tables():
|
||||
print(table.name)
|
||||
|
||||
# Get table client
|
||||
table_client = service_client.get_table_client("mytable")
|
||||
```
|
||||
|
||||
## Entity Operations
|
||||
|
||||
**Important**: Every entity requires `PartitionKey` and `RowKey` (together form unique ID).
|
||||
|
||||
### Create Entity
|
||||
|
||||
```python
|
||||
entity = {
|
||||
"PartitionKey": "sales",
|
||||
"RowKey": "order-001",
|
||||
"product": "Widget",
|
||||
"quantity": 5,
|
||||
"price": 9.99,
|
||||
"shipped": False
|
||||
}
|
||||
|
||||
# Create (fails if exists)
|
||||
table_client.create_entity(entity=entity)
|
||||
|
||||
# Upsert (create or replace)
|
||||
table_client.upsert_entity(entity=entity)
|
||||
```
|
||||
|
||||
### Get Entity
|
||||
|
||||
```python
|
||||
# Get by key (fastest)
|
||||
entity = table_client.get_entity(
|
||||
partition_key="sales",
|
||||
row_key="order-001"
|
||||
)
|
||||
print(f"Product: {entity['product']}")
|
||||
```
|
||||
|
||||
### Update Entity
|
||||
|
||||
```python
|
||||
# Replace entire entity
|
||||
entity["quantity"] = 10
|
||||
table_client.update_entity(entity=entity, mode="replace")
|
||||
|
||||
# Merge (update specific fields only)
|
||||
update = {
|
||||
"PartitionKey": "sales",
|
||||
"RowKey": "order-001",
|
||||
"shipped": True
|
||||
}
|
||||
table_client.update_entity(entity=update, mode="merge")
|
||||
```
|
||||
|
||||
### Delete Entity
|
||||
|
||||
```python
|
||||
table_client.delete_entity(
|
||||
partition_key="sales",
|
||||
row_key="order-001"
|
||||
)
|
||||
```
|
||||
|
||||
## Query Entities
|
||||
|
||||
### Query Within Partition
|
||||
|
||||
```python
|
||||
# Query by partition (efficient)
|
||||
entities = table_client.query_entities(
|
||||
query_filter="PartitionKey eq 'sales'"
|
||||
)
|
||||
for entity in entities:
|
||||
print(entity)
|
||||
```
|
||||
|
||||
### Query with Filters
|
||||
|
||||
```python
|
||||
# Filter by properties
|
||||
entities = table_client.query_entities(
|
||||
query_filter="PartitionKey eq 'sales' and quantity gt 3"
|
||||
)
|
||||
|
||||
# With parameters (safer)
|
||||
entities = table_client.query_entities(
|
||||
query_filter="PartitionKey eq @pk and price lt @max_price",
|
||||
parameters={"pk": "sales", "max_price": 50.0}
|
||||
)
|
||||
```
|
||||
|
||||
### Select Specific Properties
|
||||
|
||||
```python
|
||||
entities = table_client.query_entities(
|
||||
query_filter="PartitionKey eq 'sales'",
|
||||
select=["RowKey", "product", "price"]
|
||||
)
|
||||
```
|
||||
|
||||
### List All Entities
|
||||
|
||||
```python
|
||||
# List all (cross-partition - use sparingly)
|
||||
for entity in table_client.list_entities():
|
||||
print(entity)
|
||||
```
|
||||
|
||||
## Batch Operations
|
||||
|
||||
```python
|
||||
from azure.data.tables import TableTransactionError
|
||||
|
||||
# Batch operations (same partition only!)
|
||||
operations = [
|
||||
("create", {"PartitionKey": "batch", "RowKey": "1", "data": "first"}),
|
||||
("create", {"PartitionKey": "batch", "RowKey": "2", "data": "second"}),
|
||||
("upsert", {"PartitionKey": "batch", "RowKey": "3", "data": "third"}),
|
||||
]
|
||||
|
||||
try:
|
||||
table_client.submit_transaction(operations)
|
||||
except TableTransactionError as e:
|
||||
print(f"Transaction failed: {e}")
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.data.tables.aio import TableServiceClient, TableClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def table_operations():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with TableClient(
|
||||
endpoint="https://<account>.table.core.windows.net",
|
||||
table_name="mytable",
|
||||
credential=credential
|
||||
) as client:
|
||||
# Create
|
||||
await client.create_entity(entity={
|
||||
"PartitionKey": "async",
|
||||
"RowKey": "1",
|
||||
"data": "test"
|
||||
})
|
||||
|
||||
# Query
|
||||
async for entity in client.query_entities("PartitionKey eq 'async'"):
|
||||
print(entity)
|
||||
|
||||
import asyncio
|
||||
asyncio.run(table_operations())
|
||||
```
|
||||
|
||||
## Data Types
|
||||
|
||||
| Python Type | Table Storage Type |
|
||||
|-------------|-------------------|
|
||||
| `str` | String |
|
||||
| `int` | Int64 |
|
||||
| `float` | Double |
|
||||
| `bool` | Boolean |
|
||||
| `datetime` | DateTime |
|
||||
| `bytes` | Binary |
|
||||
| `UUID` | Guid |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Design partition keys** for query patterns and even distribution
|
||||
2. **Query within partitions** whenever possible (cross-partition is expensive)
|
||||
3. **Use batch operations** for multiple entities in same partition
|
||||
4. **Use `upsert_entity`** for idempotent writes
|
||||
5. **Use parameterized queries** to prevent injection
|
||||
6. **Keep entities small** — max 1MB per entity
|
||||
7. **Use async client** for high-throughput scenarios
|
||||
Reference in New Issue
Block a user