feat: Add Official Microsoft & Gemini Skills (845+ Total)

🚀 Impact

Significantly expands the capabilities of **Antigravity Awesome Skills** by integrating official skill collections from **Microsoft** and **Google Gemini**. This update increases the total skill count to **845+**, making the library even more comprehensive for AI coding assistants.

 Key Changes

1. New Official Skills

- **Microsoft Skills**: Added a massive collection of official skills from [microsoft/skills](https://github.com/microsoft/skills).
  - Includes Azure, .NET, Python, TypeScript, and Semantic Kernel skills.
  - Preserves the original directory structure under `skills/official/microsoft/`.
  - Includes plugin skills from the `.github/plugins` directory.
- **Gemini Skills**: Added official Gemini API development skills under `skills/gemini-api-dev/`.

2. New Scripts & Tooling

- **`scripts/sync_microsoft_skills.py`**: A robust synchronization script that:
  - Clones the official Microsoft repository.
  - Preserves the original directory heirarchy.
  - Handles symlinks and plugin locations.
  - Generates attribution metadata.
- **`scripts/tests/inspect_microsoft_repo.py`**: Debug tool to inspect the remote repository structure.
- **`scripts/tests/test_comprehensive_coverage.py`**: Verification script to ensure 100% of skills are captured during sync.

3. Core Improvements

- **`scripts/generate_index.py`**: Enhanced frontmatter parsing to safely handle unquoted values containing `@` symbols and commas (fixing issues with some Microsoft skill descriptions).
- **`package.json`**: Added `sync:microsoft` and `sync:all-official` scripts for easy maintenance.

4. Documentation

- Updated `README.md` to reflect the new skill counts (845+) and added Microsoft/Gemini to the provider list.
- Updated `CATALOG.md` and `skills_index.json` with the new skills.

🧪 Verification

- Ran `scripts/tests/test_comprehensive_coverage.py` to verify all Microsoft skills are detected.
- Validated `generate_index.py` fixes by successfully indexing the new skills.
This commit is contained in:
Ahmed Rehan
2026-02-11 20:16:23 +05:00
parent 167d7c97c7
commit 17bce709de
145 changed files with 44081 additions and 72 deletions

View File

@@ -0,0 +1,219 @@
---
name: azure-storage-blob-py
description: |
Azure Blob Storage SDK for Python. Use for uploading, downloading, listing blobs, managing containers, and blob lifecycle.
Triggers: "blob storage", "BlobServiceClient", "ContainerClient", "BlobClient", "upload blob", "download blob".
package: azure-storage-blob
---
# Azure Blob Storage SDK for Python
Client library for Azure Blob Storage — object storage for unstructured data.
## Installation
```bash
pip install azure-storage-blob azure-identity
```
## Environment Variables
```bash
AZURE_STORAGE_ACCOUNT_NAME=<your-storage-account>
# Or use full URL
AZURE_STORAGE_ACCOUNT_URL=https://<account>.blob.core.windows.net
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
credential = DefaultAzureCredential()
account_url = "https://<account>.blob.core.windows.net"
blob_service_client = BlobServiceClient(account_url, credential=credential)
```
## Client Hierarchy
| Client | Purpose | Get From |
|--------|---------|----------|
| `BlobServiceClient` | Account-level operations | Direct instantiation |
| `ContainerClient` | Container operations | `blob_service_client.get_container_client()` |
| `BlobClient` | Single blob operations | `container_client.get_blob_client()` |
## Core Workflow
### Create Container
```python
container_client = blob_service_client.get_container_client("mycontainer")
container_client.create_container()
```
### Upload Blob
```python
# From file path
blob_client = blob_service_client.get_blob_client(
container="mycontainer",
blob="sample.txt"
)
with open("./local-file.txt", "rb") as data:
blob_client.upload_blob(data, overwrite=True)
# From bytes/string
blob_client.upload_blob(b"Hello, World!", overwrite=True)
# From stream
import io
stream = io.BytesIO(b"Stream content")
blob_client.upload_blob(stream, overwrite=True)
```
### Download Blob
```python
blob_client = blob_service_client.get_blob_client(
container="mycontainer",
blob="sample.txt"
)
# To file
with open("./downloaded.txt", "wb") as file:
download_stream = blob_client.download_blob()
file.write(download_stream.readall())
# To memory
download_stream = blob_client.download_blob()
content = download_stream.readall() # bytes
# Read into existing buffer
stream = io.BytesIO()
num_bytes = blob_client.download_blob().readinto(stream)
```
### List Blobs
```python
container_client = blob_service_client.get_container_client("mycontainer")
# List all blobs
for blob in container_client.list_blobs():
print(f"{blob.name} - {blob.size} bytes")
# List with prefix (folder-like)
for blob in container_client.list_blobs(name_starts_with="logs/"):
print(blob.name)
# Walk blob hierarchy (virtual directories)
for item in container_client.walk_blobs(delimiter="/"):
if item.get("prefix"):
print(f"Directory: {item['prefix']}")
else:
print(f"Blob: {item.name}")
```
### Delete Blob
```python
blob_client.delete_blob()
# Delete with snapshots
blob_client.delete_blob(delete_snapshots="include")
```
## Performance Tuning
```python
# Configure chunk sizes for large uploads/downloads
blob_client = BlobClient(
account_url=account_url,
container_name="mycontainer",
blob_name="large-file.zip",
credential=credential,
max_block_size=4 * 1024 * 1024, # 4 MiB blocks
max_single_put_size=64 * 1024 * 1024 # 64 MiB single upload limit
)
# Parallel upload
blob_client.upload_blob(data, max_concurrency=4)
# Parallel download
download_stream = blob_client.download_blob(max_concurrency=4)
```
## SAS Tokens
```python
from datetime import datetime, timedelta, timezone
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
sas_token = generate_blob_sas(
account_name="<account>",
container_name="mycontainer",
blob_name="sample.txt",
account_key="<account-key>", # Or use user delegation key
permission=BlobSasPermissions(read=True),
expiry=datetime.now(timezone.utc) + timedelta(hours=1)
)
# Use SAS token
blob_url = f"https://<account>.blob.core.windows.net/mycontainer/sample.txt?{sas_token}"
```
## Blob Properties and Metadata
```python
# Get properties
properties = blob_client.get_blob_properties()
print(f"Size: {properties.size}")
print(f"Content-Type: {properties.content_settings.content_type}")
print(f"Last modified: {properties.last_modified}")
# Set metadata
blob_client.set_blob_metadata(metadata={"category": "logs", "year": "2024"})
# Set content type
from azure.storage.blob import ContentSettings
blob_client.set_http_headers(
content_settings=ContentSettings(content_type="application/json")
)
```
## Async Client
```python
from azure.identity.aio import DefaultAzureCredential
from azure.storage.blob.aio import BlobServiceClient
async def upload_async():
credential = DefaultAzureCredential()
async with BlobServiceClient(account_url, credential=credential) as client:
blob_client = client.get_blob_client("mycontainer", "sample.txt")
with open("./file.txt", "rb") as data:
await blob_client.upload_blob(data, overwrite=True)
# Download async
async def download_async():
async with BlobServiceClient(account_url, credential=credential) as client:
blob_client = client.get_blob_client("mycontainer", "sample.txt")
stream = await blob_client.download_blob()
data = await stream.readall()
```
## Best Practices
1. **Use DefaultAzureCredential** instead of connection strings
2. **Use context managers** for async clients
3. **Set `overwrite=True`** explicitly when re-uploading
4. **Use `max_concurrency`** for large file transfers
5. **Prefer `readinto()`** over `readall()` for memory efficiency
6. **Use `walk_blobs()`** for hierarchical listing
7. **Set appropriate content types** for web-served blobs

View File

@@ -0,0 +1,239 @@
---
name: azure-cosmos-db-py
description: Build Azure Cosmos DB NoSQL services with Python/FastAPI following production-grade patterns. Use when implementing database client setup with dual auth (DefaultAzureCredential + emulator), service layer classes with CRUD operations, partition key strategies, parameterized queries, or TDD patterns for Cosmos. Triggers on phrases like "Cosmos DB", "NoSQL database", "document store", "add persistence", "database service layer", or "Python Cosmos SDK".
package: azure-cosmos
---
# Cosmos DB Service Implementation
Build production-grade Azure Cosmos DB NoSQL services following clean code, security best practices, and TDD principles.
## Installation
```bash
pip install azure-cosmos azure-identity
```
## Environment Variables
```bash
COSMOS_ENDPOINT=https://<account>.documents.azure.com:443/
COSMOS_DATABASE_NAME=<database-name>
COSMOS_CONTAINER_ID=<container-id>
# For emulator only (not production)
COSMOS_KEY=<emulator-key>
```
## Authentication
**DefaultAzureCredential (preferred)**:
```python
from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential
client = CosmosClient(
url=os.environ["COSMOS_ENDPOINT"],
credential=DefaultAzureCredential()
)
```
**Emulator (local development)**:
```python
from azure.cosmos import CosmosClient
client = CosmosClient(
url="https://localhost:8081",
credential=os.environ["COSMOS_KEY"],
connection_verify=False
)
```
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ FastAPI Router │
│ - Auth dependencies (get_current_user, get_current_user_required)
│ - HTTP error responses (HTTPException) │
└──────────────────────────────┬──────────────────────────────────┘
┌──────────────────────────────▼──────────────────────────────────┐
│ Service Layer │
│ - Business logic and validation │
│ - Document ↔ Model conversion │
│ - Graceful degradation when Cosmos unavailable │
└──────────────────────────────┬──────────────────────────────────┘
┌──────────────────────────────▼──────────────────────────────────┐
│ Cosmos DB Client Module │
│ - Singleton container initialization │
│ - Dual auth: DefaultAzureCredential (Azure) / Key (emulator) │
│ - Async wrapper via run_in_threadpool │
└─────────────────────────────────────────────────────────────────┘
```
## Quick Start
### 1. Client Module Setup
Create a singleton Cosmos client with dual authentication:
```python
# db/cosmos.py
from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential
from starlette.concurrency import run_in_threadpool
_cosmos_container = None
def _is_emulator_endpoint(endpoint: str) -> bool:
return "localhost" in endpoint or "127.0.0.1" in endpoint
async def get_container():
global _cosmos_container
if _cosmos_container is None:
if _is_emulator_endpoint(settings.cosmos_endpoint):
client = CosmosClient(
url=settings.cosmos_endpoint,
credential=settings.cosmos_key,
connection_verify=False
)
else:
client = CosmosClient(
url=settings.cosmos_endpoint,
credential=DefaultAzureCredential()
)
db = client.get_database_client(settings.cosmos_database_name)
_cosmos_container = db.get_container_client(settings.cosmos_container_id)
return _cosmos_container
```
**Full implementation**: See [references/client-setup.md](references/client-setup.md)
### 2. Pydantic Model Hierarchy
Use five-tier model pattern for clean separation:
```python
class ProjectBase(BaseModel): # Shared fields
name: str = Field(..., min_length=1, max_length=200)
class ProjectCreate(ProjectBase): # Creation request
workspace_id: str = Field(..., alias="workspaceId")
class ProjectUpdate(BaseModel): # Partial updates (all optional)
name: Optional[str] = Field(None, min_length=1)
class Project(ProjectBase): # API response
id: str
created_at: datetime = Field(..., alias="createdAt")
class ProjectInDB(Project): # Internal with docType
doc_type: str = "project"
```
### 3. Service Layer Pattern
```python
class ProjectService:
def _use_cosmos(self) -> bool:
return get_container() is not None
async def get_by_id(self, project_id: str, workspace_id: str) -> Project | None:
if not self._use_cosmos():
return None
doc = await get_document(project_id, partition_key=workspace_id)
if doc is None:
return None
return self._doc_to_model(doc)
```
**Full patterns**: See [references/service-layer.md](references/service-layer.md)
## Core Principles
### Security Requirements
1. **RBAC Authentication**: Use `DefaultAzureCredential` in Azure — never store keys in code
2. **Emulator-Only Keys**: Hardcode the well-known emulator key only for local development
3. **Parameterized Queries**: Always use `@parameter` syntax — never string concatenation
4. **Partition Key Validation**: Validate partition key access matches user authorization
### Clean Code Conventions
1. **Single Responsibility**: Client module handles connection; services handle business logic
2. **Graceful Degradation**: Services return `None`/`[]` when Cosmos unavailable
3. **Consistent Naming**: `_doc_to_model()`, `_model_to_doc()`, `_use_cosmos()`
4. **Type Hints**: Full typing on all public methods
5. **CamelCase Aliases**: Use `Field(alias="camelCase")` for JSON serialization
### TDD Requirements
Write tests BEFORE implementation using these patterns:
```python
@pytest.fixture
def mock_cosmos_container(mocker):
container = mocker.MagicMock()
mocker.patch("app.db.cosmos.get_container", return_value=container)
return container
@pytest.mark.asyncio
async def test_get_project_by_id_returns_project(mock_cosmos_container):
# Arrange
mock_cosmos_container.read_item.return_value = {"id": "123", "name": "Test"}
# Act
result = await project_service.get_by_id("123", "workspace-1")
# Assert
assert result.id == "123"
assert result.name == "Test"
```
**Full testing guide**: See [references/testing.md](references/testing.md)
## Reference Files
| File | When to Read |
|------|--------------|
| [references/client-setup.md](references/client-setup.md) | Setting up Cosmos client with dual auth, SSL config, singleton pattern |
| [references/service-layer.md](references/service-layer.md) | Implementing full service class with CRUD, conversions, graceful degradation |
| [references/testing.md](references/testing.md) | Writing pytest tests, mocking Cosmos, integration test setup |
| [references/partitioning.md](references/partitioning.md) | Choosing partition keys, cross-partition queries, move operations |
| [references/error-handling.md](references/error-handling.md) | Handling CosmosResourceNotFoundError, logging, HTTP error mapping |
## Template Files
| File | Purpose |
|------|---------|
| [assets/cosmos_client_template.py](assets/cosmos_client_template.py) | Ready-to-use client module |
| [assets/service_template.py](assets/service_template.py) | Service class skeleton |
| [assets/conftest_template.py](assets/conftest_template.py) | pytest fixtures for Cosmos mocking |
## Quality Attributes (NFRs)
### Reliability
- Graceful degradation when Cosmos unavailable
- Retry logic with exponential backoff for transient failures
- Connection pooling via singleton pattern
### Security
- Zero secrets in code (RBAC via DefaultAzureCredential)
- Parameterized queries prevent injection
- Partition key isolation enforces data boundaries
### Maintainability
- Five-tier model pattern enables schema evolution
- Service layer decouples business logic from storage
- Consistent patterns across all entity services
### Testability
- Dependency injection via `get_container()`
- Easy mocking with module-level globals
- Clear separation enables unit testing without Cosmos
### Performance
- Partition key queries avoid cross-partition scans
- Async wrapping prevents blocking FastAPI event loop
- Minimal document conversion overhead

View File

@@ -0,0 +1,280 @@
---
name: azure-cosmos-py
description: |
Azure Cosmos DB SDK for Python (NoSQL API). Use for document CRUD, queries, containers, and globally distributed data.
Triggers: "cosmos db", "CosmosClient", "container", "document", "NoSQL", "partition key".
package: azure-cosmos
---
# Azure Cosmos DB SDK for Python
Client library for Azure Cosmos DB NoSQL API — globally distributed, multi-model database.
## Installation
```bash
pip install azure-cosmos azure-identity
```
## Environment Variables
```bash
COSMOS_ENDPOINT=https://<account>.documents.azure.com:443/
COSMOS_DATABASE=mydb
COSMOS_CONTAINER=mycontainer
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.cosmos import CosmosClient
credential = DefaultAzureCredential()
endpoint = "https://<account>.documents.azure.com:443/"
client = CosmosClient(url=endpoint, credential=credential)
```
## Client Hierarchy
| Client | Purpose | Get From |
|--------|---------|----------|
| `CosmosClient` | Account-level operations | Direct instantiation |
| `DatabaseProxy` | Database operations | `client.get_database_client()` |
| `ContainerProxy` | Container/item operations | `database.get_container_client()` |
## Core Workflow
### Setup Database and Container
```python
# Get or create database
database = client.create_database_if_not_exists(id="mydb")
# Get or create container with partition key
container = database.create_container_if_not_exists(
id="mycontainer",
partition_key=PartitionKey(path="/category")
)
# Get existing
database = client.get_database_client("mydb")
container = database.get_container_client("mycontainer")
```
### Create Item
```python
item = {
"id": "item-001", # Required: unique within partition
"category": "electronics", # Partition key value
"name": "Laptop",
"price": 999.99,
"tags": ["computer", "portable"]
}
created = container.create_item(body=item)
print(f"Created: {created['id']}")
```
### Read Item
```python
# Read requires id AND partition key
item = container.read_item(
item="item-001",
partition_key="electronics"
)
print(f"Name: {item['name']}")
```
### Update Item (Replace)
```python
item = container.read_item(item="item-001", partition_key="electronics")
item["price"] = 899.99
item["on_sale"] = True
updated = container.replace_item(item=item["id"], body=item)
```
### Upsert Item
```python
# Create if not exists, replace if exists
item = {
"id": "item-002",
"category": "electronics",
"name": "Tablet",
"price": 499.99
}
result = container.upsert_item(body=item)
```
### Delete Item
```python
container.delete_item(
item="item-001",
partition_key="electronics"
)
```
## Queries
### Basic Query
```python
# Query within a partition (efficient)
query = "SELECT * FROM c WHERE c.price < @max_price"
items = container.query_items(
query=query,
parameters=[{"name": "@max_price", "value": 500}],
partition_key="electronics"
)
for item in items:
print(f"{item['name']}: ${item['price']}")
```
### Cross-Partition Query
```python
# Cross-partition (more expensive, use sparingly)
query = "SELECT * FROM c WHERE c.price < @max_price"
items = container.query_items(
query=query,
parameters=[{"name": "@max_price", "value": 500}],
enable_cross_partition_query=True
)
for item in items:
print(item)
```
### Query with Projection
```python
query = "SELECT c.id, c.name, c.price FROM c WHERE c.category = @category"
items = container.query_items(
query=query,
parameters=[{"name": "@category", "value": "electronics"}],
partition_key="electronics"
)
```
### Read All Items
```python
# Read all in a partition
items = container.read_all_items() # Cross-partition
# Or with partition key
items = container.query_items(
query="SELECT * FROM c",
partition_key="electronics"
)
```
## Partition Keys
**Critical**: Always include partition key for efficient operations.
```python
from azure.cosmos import PartitionKey
# Single partition key
container = database.create_container_if_not_exists(
id="orders",
partition_key=PartitionKey(path="/customer_id")
)
# Hierarchical partition key (preview)
container = database.create_container_if_not_exists(
id="events",
partition_key=PartitionKey(path=["/tenant_id", "/user_id"])
)
```
## Throughput
```python
# Create container with provisioned throughput
container = database.create_container_if_not_exists(
id="mycontainer",
partition_key=PartitionKey(path="/pk"),
offer_throughput=400 # RU/s
)
# Read current throughput
offer = container.read_offer()
print(f"Throughput: {offer.offer_throughput} RU/s")
# Update throughput
container.replace_throughput(throughput=1000)
```
## Async Client
```python
from azure.cosmos.aio import CosmosClient
from azure.identity.aio import DefaultAzureCredential
async def cosmos_operations():
credential = DefaultAzureCredential()
async with CosmosClient(endpoint, credential=credential) as client:
database = client.get_database_client("mydb")
container = database.get_container_client("mycontainer")
# Create
await container.create_item(body={"id": "1", "pk": "test"})
# Read
item = await container.read_item(item="1", partition_key="test")
# Query
async for item in container.query_items(
query="SELECT * FROM c",
partition_key="test"
):
print(item)
import asyncio
asyncio.run(cosmos_operations())
```
## Error Handling
```python
from azure.cosmos.exceptions import CosmosHttpResponseError
try:
item = container.read_item(item="nonexistent", partition_key="pk")
except CosmosHttpResponseError as e:
if e.status_code == 404:
print("Item not found")
elif e.status_code == 429:
print(f"Rate limited. Retry after: {e.headers.get('x-ms-retry-after-ms')}ms")
else:
raise
```
## Best Practices
1. **Always specify partition key** for point reads and queries
2. **Use parameterized queries** to prevent injection and improve caching
3. **Avoid cross-partition queries** when possible
4. **Use `upsert_item`** for idempotent writes
5. **Use async client** for high-throughput scenarios
6. **Design partition key** for even data distribution
7. **Use `read_item`** instead of query for single document retrieval
## Reference Files
| File | Contents |
|------|----------|
| [references/partitioning.md](references/partitioning.md) | Partition key strategies, hierarchical keys, hot partition detection and mitigation |
| [references/query-patterns.md](references/query-patterns.md) | Query optimization, aggregations, pagination, transactions, change feed |
| [scripts/setup_cosmos_container.py](scripts/setup_cosmos_container.py) | CLI tool for creating containers with partitioning, throughput, and indexing |

View File

@@ -0,0 +1,211 @@
---
name: azure-storage-file-datalake-py
description: |
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".
package: azure-storage-file-datalake
---
# Azure Data Lake Storage Gen2 SDK for Python
Hierarchical file system for big data analytics workloads.
## Installation
```bash
pip install azure-storage-file-datalake azure-identity
```
## Environment Variables
```bash
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
```
## Client Hierarchy
| Client | Purpose |
|--------|---------|
| `DataLakeServiceClient` | Account-level operations |
| `FileSystemClient` | Container (file system) operations |
| `DataLakeDirectoryClient` | Directory operations |
| `DataLakeFileClient` | File operations |
## File System Operations
```python
# Create file system (container)
file_system_client = service_client.create_file_system("myfilesystem")
# Get existing
file_system_client = service_client.get_file_system_client("myfilesystem")
# Delete
service_client.delete_file_system("myfilesystem")
# List file systems
for fs in service_client.list_file_systems():
print(fs.name)
```
## Directory Operations
```python
file_system_client = service_client.get_file_system_client("myfilesystem")
# Create directory
directory_client = file_system_client.create_directory("mydir")
# Create nested directories
directory_client = file_system_client.create_directory("path/to/nested/dir")
# Get directory client
directory_client = file_system_client.get_directory_client("mydir")
# Delete directory
directory_client.delete_directory()
# Rename/move directory
directory_client.rename_directory(new_name="myfilesystem/newname")
```
## File Operations
### Upload File
```python
# Get file client
file_client = file_system_client.get_file_client("path/to/file.txt")
# Upload from local file
with open("local-file.txt", "rb") as data:
file_client.upload_data(data, overwrite=True)
# Upload bytes
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
# Append data (for large files)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12) # Commit the data
```
### Download File
```python
file_client = file_system_client.get_file_client("path/to/file.txt")
# Download all content
download = file_client.download_file()
content = download.readall()
# Download to file
with open("downloaded.txt", "wb") as f:
download = file_client.download_file()
download.readinto(f)
# Download range
download = file_client.download_file(offset=0, length=100)
```
### Delete File
```python
file_client.delete_file()
```
## List Contents
```python
# List paths (files and directories)
for path in file_system_client.get_paths():
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
# List paths in directory
for path in file_system_client.get_paths(path="mydir"):
print(path.name)
# Recursive listing
for path in file_system_client.get_paths(path="mydir", recursive=True):
print(path.name)
```
## File/Directory Properties
```python
# Get properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")
# Set metadata
file_client.set_metadata(metadata={"processed": "true"})
```
## Access Control (ACL)
```python
# Get ACL
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")
# Set ACL
directory_client.set_access_control(
owner="user-id",
permissions="rwxr-x---"
)
# Update ACL entries
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
acl="user:user-id:rwx"
)
```
## Async Client
```python
from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential
async def datalake_operations():
credential = DefaultAzureCredential()
async with DataLakeServiceClient(
account_url="https://<account>.dfs.core.windows.net",
credential=credential
) as service_client:
file_system_client = service_client.get_file_system_client("myfilesystem")
file_client = file_system_client.get_file_client("test.txt")
await file_client.upload_data(b"async content", overwrite=True)
download = await file_client.download_file()
content = await download.readall()
import asyncio
asyncio.run(datalake_operations())
```
## Best Practices
1. **Use hierarchical namespace** for file system semantics
2. **Use `append_data` + `flush_data`** for large file uploads
3. **Set ACLs at directory level** and inherit to children
4. **Use async client** for high-throughput scenarios
5. **Use `get_paths` with `recursive=True`** for full directory listing
6. **Set metadata** for custom file attributes
7. **Consider Blob API** for simple object storage use cases

View File

@@ -0,0 +1,238 @@
---
name: azure-storage-file-share-py
description: |
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.
Triggers: "azure-storage-file-share", "ShareServiceClient", "ShareClient", "file share", "SMB".
---
# Azure Storage File Share SDK for Python
Manage SMB file shares for cloud-native and lift-and-shift scenarios.
## Installation
```bash
pip install azure-storage-file-share
```
## Environment Variables
```bash
AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...
# Or
AZURE_STORAGE_ACCOUNT_URL=https://<account>.file.core.windows.net
```
## Authentication
### Connection String
```python
from azure.storage.fileshare import ShareServiceClient
service = ShareServiceClient.from_connection_string(
os.environ["AZURE_STORAGE_CONNECTION_STRING"]
)
```
### Entra ID
```python
from azure.storage.fileshare import ShareServiceClient
from azure.identity import DefaultAzureCredential
service = ShareServiceClient(
account_url=os.environ["AZURE_STORAGE_ACCOUNT_URL"],
credential=DefaultAzureCredential()
)
```
## Share Operations
### Create Share
```python
share = service.create_share("my-share")
```
### List Shares
```python
for share in service.list_shares():
print(f"{share.name}: {share.quota} GB")
```
### Get Share Client
```python
share_client = service.get_share_client("my-share")
```
### Delete Share
```python
service.delete_share("my-share")
```
## Directory Operations
### Create Directory
```python
share_client = service.get_share_client("my-share")
share_client.create_directory("my-directory")
# Nested directory
share_client.create_directory("my-directory/sub-directory")
```
### List Directories and Files
```python
directory_client = share_client.get_directory_client("my-directory")
for item in directory_client.list_directories_and_files():
if item["is_directory"]:
print(f"[DIR] {item['name']}")
else:
print(f"[FILE] {item['name']} ({item['size']} bytes)")
```
### Delete Directory
```python
share_client.delete_directory("my-directory")
```
## File Operations
### Upload File
```python
file_client = share_client.get_file_client("my-directory/file.txt")
# From string
file_client.upload_file("Hello, World!")
# From file
with open("local-file.txt", "rb") as f:
file_client.upload_file(f)
# From bytes
file_client.upload_file(b"Binary content")
```
### Download File
```python
file_client = share_client.get_file_client("my-directory/file.txt")
# To bytes
data = file_client.download_file().readall()
# To file
with open("downloaded.txt", "wb") as f:
data = file_client.download_file()
data.readinto(f)
# Stream chunks
download = file_client.download_file()
for chunk in download.chunks():
process(chunk)
```
### Get File Properties
```python
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Content type: {properties.content_settings.content_type}")
print(f"Last modified: {properties.last_modified}")
```
### Delete File
```python
file_client.delete_file()
```
### Copy File
```python
source_url = "https://account.file.core.windows.net/share/source.txt"
dest_client = share_client.get_file_client("destination.txt")
dest_client.start_copy_from_url(source_url)
```
## Range Operations
### Upload Range
```python
# Upload to specific range
file_client.upload_range(data=b"content", offset=0, length=7)
```
### Download Range
```python
# Download specific range
download = file_client.download_file(offset=0, length=100)
data = download.readall()
```
## Snapshot Operations
### Create Snapshot
```python
snapshot = share_client.create_snapshot()
print(f"Snapshot: {snapshot['snapshot']}")
```
### Access Snapshot
```python
snapshot_client = service.get_share_client(
"my-share",
snapshot=snapshot["snapshot"]
)
```
## Async Client
```python
from azure.storage.fileshare.aio import ShareServiceClient
from azure.identity.aio import DefaultAzureCredential
async def upload_file():
credential = DefaultAzureCredential()
service = ShareServiceClient(account_url, credential=credential)
share = service.get_share_client("my-share")
file_client = share.get_file_client("test.txt")
await file_client.upload_file("Hello!")
await service.close()
await credential.close()
```
## Client Types
| Client | Purpose |
|--------|---------|
| `ShareServiceClient` | Account-level operations |
| `ShareClient` | Share operations |
| `ShareDirectoryClient` | Directory operations |
| `ShareFileClient` | File operations |
## Best Practices
1. **Use connection string** for simplest setup
2. **Use Entra ID** for production with RBAC
3. **Stream large files** using chunks() to avoid memory issues
4. **Create snapshots** before major changes
5. **Set quotas** to prevent unexpected storage costs
6. **Use ranges** for partial file updates
7. **Close async clients** explicitly

View File

@@ -0,0 +1,213 @@
---
name: azure-storage-queue-py
description: |
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.
Triggers: "queue storage", "QueueServiceClient", "QueueClient", "message queue", "dequeue".
package: azure-storage-queue
---
# Azure Queue Storage SDK for Python
Simple, cost-effective message queuing for asynchronous communication.
## Installation
```bash
pip install azure-storage-queue azure-identity
```
## Environment Variables
```bash
AZURE_STORAGE_ACCOUNT_URL=https://<account>.queue.core.windows.net
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.storage.queue import QueueServiceClient, QueueClient
credential = DefaultAzureCredential()
account_url = "https://<account>.queue.core.windows.net"
# Service client
service_client = QueueServiceClient(account_url=account_url, credential=credential)
# Queue client
queue_client = QueueClient(account_url=account_url, queue_name="myqueue", credential=credential)
```
## Queue Operations
```python
# Create queue
service_client.create_queue("myqueue")
# Get queue client
queue_client = service_client.get_queue_client("myqueue")
# Delete queue
service_client.delete_queue("myqueue")
# List queues
for queue in service_client.list_queues():
print(queue.name)
```
## Send Messages
```python
# Send message (string)
queue_client.send_message("Hello, Queue!")
# Send with options
queue_client.send_message(
content="Delayed message",
visibility_timeout=60, # Hidden for 60 seconds
time_to_live=3600 # Expires in 1 hour
)
# Send JSON
import json
data = {"task": "process", "id": 123}
queue_client.send_message(json.dumps(data))
```
## Receive Messages
```python
# Receive messages (makes them invisible temporarily)
messages = queue_client.receive_messages(
messages_per_page=10,
visibility_timeout=30 # 30 seconds to process
)
for message in messages:
print(f"ID: {message.id}")
print(f"Content: {message.content}")
print(f"Dequeue count: {message.dequeue_count}")
# Process message...
# Delete after processing
queue_client.delete_message(message)
```
## Peek Messages
```python
# Peek without hiding (doesn't affect visibility)
messages = queue_client.peek_messages(max_messages=5)
for message in messages:
print(message.content)
```
## Update Message
```python
# Extend visibility or update content
messages = queue_client.receive_messages()
for message in messages:
# Extend timeout (need more time)
queue_client.update_message(
message,
visibility_timeout=60
)
# Update content and timeout
queue_client.update_message(
message,
content="Updated content",
visibility_timeout=60
)
```
## Delete Message
```python
# Delete after successful processing
messages = queue_client.receive_messages()
for message in messages:
try:
# Process...
queue_client.delete_message(message)
except Exception:
# Message becomes visible again after timeout
pass
```
## Clear Queue
```python
# Delete all messages
queue_client.clear_messages()
```
## Queue Properties
```python
# Get queue properties
properties = queue_client.get_queue_properties()
print(f"Approximate message count: {properties.approximate_message_count}")
# Set/get metadata
queue_client.set_queue_metadata(metadata={"environment": "production"})
properties = queue_client.get_queue_properties()
print(properties.metadata)
```
## Async Client
```python
from azure.storage.queue.aio import QueueServiceClient, QueueClient
from azure.identity.aio import DefaultAzureCredential
async def queue_operations():
credential = DefaultAzureCredential()
async with QueueClient(
account_url="https://<account>.queue.core.windows.net",
queue_name="myqueue",
credential=credential
) as client:
# Send
await client.send_message("Async message")
# Receive
async for message in client.receive_messages():
print(message.content)
await client.delete_message(message)
import asyncio
asyncio.run(queue_operations())
```
## Base64 Encoding
```python
from azure.storage.queue import QueueClient, BinaryBase64EncodePolicy, BinaryBase64DecodePolicy
# For binary data
queue_client = QueueClient(
account_url=account_url,
queue_name="myqueue",
credential=credential,
message_encode_policy=BinaryBase64EncodePolicy(),
message_decode_policy=BinaryBase64DecodePolicy()
)
# Send bytes
queue_client.send_message(b"Binary content")
```
## Best Practices
1. **Delete messages after processing** to prevent reprocessing
2. **Set appropriate visibility timeout** based on processing time
3. **Handle `dequeue_count`** for poison message detection
4. **Use async client** for high-throughput scenarios
5. **Use `peek_messages`** for monitoring without affecting queue
6. **Set `time_to_live`** to prevent stale messages
7. **Consider Service Bus** for advanced features (sessions, topics)

View File

@@ -0,0 +1,243 @@
---
name: azure-data-tables-py
description: |
Azure Tables SDK for Python (Storage and Cosmos DB). Use for NoSQL key-value storage, entity CRUD, and batch operations.
Triggers: "table storage", "TableServiceClient", "TableClient", "entities", "PartitionKey", "RowKey".
package: azure-data-tables
---
# Azure Tables SDK for Python
NoSQL key-value store for structured data (Azure Storage Tables or Cosmos DB Table API).
## Installation
```bash
pip install azure-data-tables azure-identity
```
## Environment Variables
```bash
# Azure Storage Tables
AZURE_STORAGE_ACCOUNT_URL=https://<account>.table.core.windows.net
# Cosmos DB Table API
COSMOS_TABLE_ENDPOINT=https://<account>.table.cosmos.azure.com
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.data.tables import TableServiceClient, TableClient
credential = DefaultAzureCredential()
endpoint = "https://<account>.table.core.windows.net"
# Service client (manage tables)
service_client = TableServiceClient(endpoint=endpoint, credential=credential)
# Table client (work with entities)
table_client = TableClient(endpoint=endpoint, table_name="mytable", credential=credential)
```
## Client Types
| Client | Purpose |
|--------|---------|
| `TableServiceClient` | Create/delete tables, list tables |
| `TableClient` | Entity CRUD, queries |
## Table Operations
```python
# Create table
service_client.create_table("mytable")
# Create if not exists
service_client.create_table_if_not_exists("mytable")
# Delete table
service_client.delete_table("mytable")
# List tables
for table in service_client.list_tables():
print(table.name)
# Get table client
table_client = service_client.get_table_client("mytable")
```
## Entity Operations
**Important**: Every entity requires `PartitionKey` and `RowKey` (together form unique ID).
### Create Entity
```python
entity = {
"PartitionKey": "sales",
"RowKey": "order-001",
"product": "Widget",
"quantity": 5,
"price": 9.99,
"shipped": False
}
# Create (fails if exists)
table_client.create_entity(entity=entity)
# Upsert (create or replace)
table_client.upsert_entity(entity=entity)
```
### Get Entity
```python
# Get by key (fastest)
entity = table_client.get_entity(
partition_key="sales",
row_key="order-001"
)
print(f"Product: {entity['product']}")
```
### Update Entity
```python
# Replace entire entity
entity["quantity"] = 10
table_client.update_entity(entity=entity, mode="replace")
# Merge (update specific fields only)
update = {
"PartitionKey": "sales",
"RowKey": "order-001",
"shipped": True
}
table_client.update_entity(entity=update, mode="merge")
```
### Delete Entity
```python
table_client.delete_entity(
partition_key="sales",
row_key="order-001"
)
```
## Query Entities
### Query Within Partition
```python
# Query by partition (efficient)
entities = table_client.query_entities(
query_filter="PartitionKey eq 'sales'"
)
for entity in entities:
print(entity)
```
### Query with Filters
```python
# Filter by properties
entities = table_client.query_entities(
query_filter="PartitionKey eq 'sales' and quantity gt 3"
)
# With parameters (safer)
entities = table_client.query_entities(
query_filter="PartitionKey eq @pk and price lt @max_price",
parameters={"pk": "sales", "max_price": 50.0}
)
```
### Select Specific Properties
```python
entities = table_client.query_entities(
query_filter="PartitionKey eq 'sales'",
select=["RowKey", "product", "price"]
)
```
### List All Entities
```python
# List all (cross-partition - use sparingly)
for entity in table_client.list_entities():
print(entity)
```
## Batch Operations
```python
from azure.data.tables import TableTransactionError
# Batch operations (same partition only!)
operations = [
("create", {"PartitionKey": "batch", "RowKey": "1", "data": "first"}),
("create", {"PartitionKey": "batch", "RowKey": "2", "data": "second"}),
("upsert", {"PartitionKey": "batch", "RowKey": "3", "data": "third"}),
]
try:
table_client.submit_transaction(operations)
except TableTransactionError as e:
print(f"Transaction failed: {e}")
```
## Async Client
```python
from azure.data.tables.aio import TableServiceClient, TableClient
from azure.identity.aio import DefaultAzureCredential
async def table_operations():
credential = DefaultAzureCredential()
async with TableClient(
endpoint="https://<account>.table.core.windows.net",
table_name="mytable",
credential=credential
) as client:
# Create
await client.create_entity(entity={
"PartitionKey": "async",
"RowKey": "1",
"data": "test"
})
# Query
async for entity in client.query_entities("PartitionKey eq 'async'"):
print(entity)
import asyncio
asyncio.run(table_operations())
```
## Data Types
| Python Type | Table Storage Type |
|-------------|-------------------|
| `str` | String |
| `int` | Int64 |
| `float` | Double |
| `bool` | Boolean |
| `datetime` | DateTime |
| `bytes` | Binary |
| `UUID` | Guid |
## Best Practices
1. **Design partition keys** for query patterns and even distribution
2. **Query within partitions** whenever possible (cross-partition is expensive)
3. **Use batch operations** for multiple entities in same partition
4. **Use `upsert_entity`** for idempotent writes
5. **Use parameterized queries** to prevent injection
6. **Keep entities small** — max 1MB per entity
7. **Use async client** for high-throughput scenarios