🚀 Impact Significantly expands the capabilities of **Antigravity Awesome Skills** by integrating official skill collections from **Microsoft** and **Google Gemini**. This update increases the total skill count to **845+**, making the library even more comprehensive for AI coding assistants. ✨ Key Changes 1. New Official Skills - **Microsoft Skills**: Added a massive collection of official skills from [microsoft/skills](https://github.com/microsoft/skills). - Includes Azure, .NET, Python, TypeScript, and Semantic Kernel skills. - Preserves the original directory structure under `skills/official/microsoft/`. - Includes plugin skills from the `.github/plugins` directory. - **Gemini Skills**: Added official Gemini API development skills under `skills/gemini-api-dev/`. 2. New Scripts & Tooling - **`scripts/sync_microsoft_skills.py`**: A robust synchronization script that: - Clones the official Microsoft repository. - Preserves the original directory heirarchy. - Handles symlinks and plugin locations. - Generates attribution metadata. - **`scripts/tests/inspect_microsoft_repo.py`**: Debug tool to inspect the remote repository structure. - **`scripts/tests/test_comprehensive_coverage.py`**: Verification script to ensure 100% of skills are captured during sync. 3. Core Improvements - **`scripts/generate_index.py`**: Enhanced frontmatter parsing to safely handle unquoted values containing `@` symbols and commas (fixing issues with some Microsoft skill descriptions). - **`package.json`**: Added `sync:microsoft` and `sync:all-official` scripts for easy maintenance. 4. Documentation - Updated `README.md` to reflect the new skill counts (845+) and added Microsoft/Gemini to the provider list. - Updated `CATALOG.md` and `skills_index.json` with the new skills. 🧪 Verification - Ran `scripts/tests/test_comprehensive_coverage.py` to verify all Microsoft skills are detected. - Validated `generate_index.py` fixes by successfully indexing the new skills.
212 lines
5.3 KiB
Markdown
212 lines
5.3 KiB
Markdown
---
|
|
name: azure-storage-file-datalake-py
|
|
description: |
|
|
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
|
|
Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".
|
|
package: azure-storage-file-datalake
|
|
---
|
|
|
|
# Azure Data Lake Storage Gen2 SDK for Python
|
|
|
|
Hierarchical file system for big data analytics workloads.
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
pip install azure-storage-file-datalake azure-identity
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
|
|
```
|
|
|
|
## Authentication
|
|
|
|
```python
|
|
from azure.identity import DefaultAzureCredential
|
|
from azure.storage.filedatalake import DataLakeServiceClient
|
|
|
|
credential = DefaultAzureCredential()
|
|
account_url = "https://<account>.dfs.core.windows.net"
|
|
|
|
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
|
|
```
|
|
|
|
## Client Hierarchy
|
|
|
|
| Client | Purpose |
|
|
|--------|---------|
|
|
| `DataLakeServiceClient` | Account-level operations |
|
|
| `FileSystemClient` | Container (file system) operations |
|
|
| `DataLakeDirectoryClient` | Directory operations |
|
|
| `DataLakeFileClient` | File operations |
|
|
|
|
## File System Operations
|
|
|
|
```python
|
|
# Create file system (container)
|
|
file_system_client = service_client.create_file_system("myfilesystem")
|
|
|
|
# Get existing
|
|
file_system_client = service_client.get_file_system_client("myfilesystem")
|
|
|
|
# Delete
|
|
service_client.delete_file_system("myfilesystem")
|
|
|
|
# List file systems
|
|
for fs in service_client.list_file_systems():
|
|
print(fs.name)
|
|
```
|
|
|
|
## Directory Operations
|
|
|
|
```python
|
|
file_system_client = service_client.get_file_system_client("myfilesystem")
|
|
|
|
# Create directory
|
|
directory_client = file_system_client.create_directory("mydir")
|
|
|
|
# Create nested directories
|
|
directory_client = file_system_client.create_directory("path/to/nested/dir")
|
|
|
|
# Get directory client
|
|
directory_client = file_system_client.get_directory_client("mydir")
|
|
|
|
# Delete directory
|
|
directory_client.delete_directory()
|
|
|
|
# Rename/move directory
|
|
directory_client.rename_directory(new_name="myfilesystem/newname")
|
|
```
|
|
|
|
## File Operations
|
|
|
|
### Upload File
|
|
|
|
```python
|
|
# Get file client
|
|
file_client = file_system_client.get_file_client("path/to/file.txt")
|
|
|
|
# Upload from local file
|
|
with open("local-file.txt", "rb") as data:
|
|
file_client.upload_data(data, overwrite=True)
|
|
|
|
# Upload bytes
|
|
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
|
|
|
|
# Append data (for large files)
|
|
file_client.append_data(data=b"chunk1", offset=0, length=6)
|
|
file_client.append_data(data=b"chunk2", offset=6, length=6)
|
|
file_client.flush_data(12) # Commit the data
|
|
```
|
|
|
|
### Download File
|
|
|
|
```python
|
|
file_client = file_system_client.get_file_client("path/to/file.txt")
|
|
|
|
# Download all content
|
|
download = file_client.download_file()
|
|
content = download.readall()
|
|
|
|
# Download to file
|
|
with open("downloaded.txt", "wb") as f:
|
|
download = file_client.download_file()
|
|
download.readinto(f)
|
|
|
|
# Download range
|
|
download = file_client.download_file(offset=0, length=100)
|
|
```
|
|
|
|
### Delete File
|
|
|
|
```python
|
|
file_client.delete_file()
|
|
```
|
|
|
|
## List Contents
|
|
|
|
```python
|
|
# List paths (files and directories)
|
|
for path in file_system_client.get_paths():
|
|
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
|
|
|
|
# List paths in directory
|
|
for path in file_system_client.get_paths(path="mydir"):
|
|
print(path.name)
|
|
|
|
# Recursive listing
|
|
for path in file_system_client.get_paths(path="mydir", recursive=True):
|
|
print(path.name)
|
|
```
|
|
|
|
## File/Directory Properties
|
|
|
|
```python
|
|
# Get properties
|
|
properties = file_client.get_file_properties()
|
|
print(f"Size: {properties.size}")
|
|
print(f"Last modified: {properties.last_modified}")
|
|
|
|
# Set metadata
|
|
file_client.set_metadata(metadata={"processed": "true"})
|
|
```
|
|
|
|
## Access Control (ACL)
|
|
|
|
```python
|
|
# Get ACL
|
|
acl = directory_client.get_access_control()
|
|
print(f"Owner: {acl['owner']}")
|
|
print(f"Permissions: {acl['permissions']}")
|
|
|
|
# Set ACL
|
|
directory_client.set_access_control(
|
|
owner="user-id",
|
|
permissions="rwxr-x---"
|
|
)
|
|
|
|
# Update ACL entries
|
|
from azure.storage.filedatalake import AccessControlChangeResult
|
|
directory_client.update_access_control_recursive(
|
|
acl="user:user-id:rwx"
|
|
)
|
|
```
|
|
|
|
## Async Client
|
|
|
|
```python
|
|
from azure.storage.filedatalake.aio import DataLakeServiceClient
|
|
from azure.identity.aio import DefaultAzureCredential
|
|
|
|
async def datalake_operations():
|
|
credential = DefaultAzureCredential()
|
|
|
|
async with DataLakeServiceClient(
|
|
account_url="https://<account>.dfs.core.windows.net",
|
|
credential=credential
|
|
) as service_client:
|
|
file_system_client = service_client.get_file_system_client("myfilesystem")
|
|
file_client = file_system_client.get_file_client("test.txt")
|
|
|
|
await file_client.upload_data(b"async content", overwrite=True)
|
|
|
|
download = await file_client.download_file()
|
|
content = await download.readall()
|
|
|
|
import asyncio
|
|
asyncio.run(datalake_operations())
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Use hierarchical namespace** for file system semantics
|
|
2. **Use `append_data` + `flush_data`** for large file uploads
|
|
3. **Set ACLs at directory level** and inherit to children
|
|
4. **Use async client** for high-throughput scenarios
|
|
5. **Use `get_paths` with `recursive=True`** for full directory listing
|
|
6. **Set metadata** for custom file attributes
|
|
7. **Consider Blob API** for simple object storage use cases
|