feat: Add Official Microsoft & Gemini Skills (845+ Total)
🚀 Impact Significantly expands the capabilities of **Antigravity Awesome Skills** by integrating official skill collections from **Microsoft** and **Google Gemini**. This update increases the total skill count to **845+**, making the library even more comprehensive for AI coding assistants. ✨ Key Changes 1. New Official Skills - **Microsoft Skills**: Added a massive collection of official skills from [microsoft/skills](https://github.com/microsoft/skills). - Includes Azure, .NET, Python, TypeScript, and Semantic Kernel skills. - Preserves the original directory structure under `skills/official/microsoft/`. - Includes plugin skills from the `.github/plugins` directory. - **Gemini Skills**: Added official Gemini API development skills under `skills/gemini-api-dev/`. 2. New Scripts & Tooling - **`scripts/sync_microsoft_skills.py`**: A robust synchronization script that: - Clones the official Microsoft repository. - Preserves the original directory heirarchy. - Handles symlinks and plugin locations. - Generates attribution metadata. - **`scripts/tests/inspect_microsoft_repo.py`**: Debug tool to inspect the remote repository structure. - **`scripts/tests/test_comprehensive_coverage.py`**: Verification script to ensure 100% of skills are captured during sync. 3. Core Improvements - **`scripts/generate_index.py`**: Enhanced frontmatter parsing to safely handle unquoted values containing `@` symbols and commas (fixing issues with some Microsoft skill descriptions). - **`package.json`**: Added `sync:microsoft` and `sync:all-official` scripts for easy maintenance. 4. Documentation - Updated `README.md` to reflect the new skill counts (845+) and added Microsoft/Gemini to the provider list. - Updated `CATALOG.md` and `skills_index.json` with the new skills. 🧪 Verification - Ran `scripts/tests/test_comprehensive_coverage.py` to verify all Microsoft skills are detected. - Validated `generate_index.py` fixes by successfully indexing the new skills.
This commit is contained in:
211
skills/official/microsoft/python/data/datalake/SKILL.md
Normal file
211
skills/official/microsoft/python/data/datalake/SKILL.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
name: azure-storage-file-datalake-py
|
||||
description: |
|
||||
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
|
||||
Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".
|
||||
package: azure-storage-file-datalake
|
||||
---
|
||||
|
||||
# Azure Data Lake Storage Gen2 SDK for Python
|
||||
|
||||
Hierarchical file system for big data analytics workloads.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-storage-file-datalake azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
```python
|
||||
from azure.identity import DefaultAzureCredential
|
||||
from azure.storage.filedatalake import DataLakeServiceClient
|
||||
|
||||
credential = DefaultAzureCredential()
|
||||
account_url = "https://<account>.dfs.core.windows.net"
|
||||
|
||||
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
|
||||
```
|
||||
|
||||
## Client Hierarchy
|
||||
|
||||
| Client | Purpose |
|
||||
|--------|---------|
|
||||
| `DataLakeServiceClient` | Account-level operations |
|
||||
| `FileSystemClient` | Container (file system) operations |
|
||||
| `DataLakeDirectoryClient` | Directory operations |
|
||||
| `DataLakeFileClient` | File operations |
|
||||
|
||||
## File System Operations
|
||||
|
||||
```python
|
||||
# Create file system (container)
|
||||
file_system_client = service_client.create_file_system("myfilesystem")
|
||||
|
||||
# Get existing
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
|
||||
# Delete
|
||||
service_client.delete_file_system("myfilesystem")
|
||||
|
||||
# List file systems
|
||||
for fs in service_client.list_file_systems():
|
||||
print(fs.name)
|
||||
```
|
||||
|
||||
## Directory Operations
|
||||
|
||||
```python
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
|
||||
# Create directory
|
||||
directory_client = file_system_client.create_directory("mydir")
|
||||
|
||||
# Create nested directories
|
||||
directory_client = file_system_client.create_directory("path/to/nested/dir")
|
||||
|
||||
# Get directory client
|
||||
directory_client = file_system_client.get_directory_client("mydir")
|
||||
|
||||
# Delete directory
|
||||
directory_client.delete_directory()
|
||||
|
||||
# Rename/move directory
|
||||
directory_client.rename_directory(new_name="myfilesystem/newname")
|
||||
```
|
||||
|
||||
## File Operations
|
||||
|
||||
### Upload File
|
||||
|
||||
```python
|
||||
# Get file client
|
||||
file_client = file_system_client.get_file_client("path/to/file.txt")
|
||||
|
||||
# Upload from local file
|
||||
with open("local-file.txt", "rb") as data:
|
||||
file_client.upload_data(data, overwrite=True)
|
||||
|
||||
# Upload bytes
|
||||
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
|
||||
|
||||
# Append data (for large files)
|
||||
file_client.append_data(data=b"chunk1", offset=0, length=6)
|
||||
file_client.append_data(data=b"chunk2", offset=6, length=6)
|
||||
file_client.flush_data(12) # Commit the data
|
||||
```
|
||||
|
||||
### Download File
|
||||
|
||||
```python
|
||||
file_client = file_system_client.get_file_client("path/to/file.txt")
|
||||
|
||||
# Download all content
|
||||
download = file_client.download_file()
|
||||
content = download.readall()
|
||||
|
||||
# Download to file
|
||||
with open("downloaded.txt", "wb") as f:
|
||||
download = file_client.download_file()
|
||||
download.readinto(f)
|
||||
|
||||
# Download range
|
||||
download = file_client.download_file(offset=0, length=100)
|
||||
```
|
||||
|
||||
### Delete File
|
||||
|
||||
```python
|
||||
file_client.delete_file()
|
||||
```
|
||||
|
||||
## List Contents
|
||||
|
||||
```python
|
||||
# List paths (files and directories)
|
||||
for path in file_system_client.get_paths():
|
||||
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
|
||||
|
||||
# List paths in directory
|
||||
for path in file_system_client.get_paths(path="mydir"):
|
||||
print(path.name)
|
||||
|
||||
# Recursive listing
|
||||
for path in file_system_client.get_paths(path="mydir", recursive=True):
|
||||
print(path.name)
|
||||
```
|
||||
|
||||
## File/Directory Properties
|
||||
|
||||
```python
|
||||
# Get properties
|
||||
properties = file_client.get_file_properties()
|
||||
print(f"Size: {properties.size}")
|
||||
print(f"Last modified: {properties.last_modified}")
|
||||
|
||||
# Set metadata
|
||||
file_client.set_metadata(metadata={"processed": "true"})
|
||||
```
|
||||
|
||||
## Access Control (ACL)
|
||||
|
||||
```python
|
||||
# Get ACL
|
||||
acl = directory_client.get_access_control()
|
||||
print(f"Owner: {acl['owner']}")
|
||||
print(f"Permissions: {acl['permissions']}")
|
||||
|
||||
# Set ACL
|
||||
directory_client.set_access_control(
|
||||
owner="user-id",
|
||||
permissions="rwxr-x---"
|
||||
)
|
||||
|
||||
# Update ACL entries
|
||||
from azure.storage.filedatalake import AccessControlChangeResult
|
||||
directory_client.update_access_control_recursive(
|
||||
acl="user:user-id:rwx"
|
||||
)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
from azure.storage.filedatalake.aio import DataLakeServiceClient
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def datalake_operations():
|
||||
credential = DefaultAzureCredential()
|
||||
|
||||
async with DataLakeServiceClient(
|
||||
account_url="https://<account>.dfs.core.windows.net",
|
||||
credential=credential
|
||||
) as service_client:
|
||||
file_system_client = service_client.get_file_system_client("myfilesystem")
|
||||
file_client = file_system_client.get_file_client("test.txt")
|
||||
|
||||
await file_client.upload_data(b"async content", overwrite=True)
|
||||
|
||||
download = await file_client.download_file()
|
||||
content = await download.readall()
|
||||
|
||||
import asyncio
|
||||
asyncio.run(datalake_operations())
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use hierarchical namespace** for file system semantics
|
||||
2. **Use `append_data` + `flush_data`** for large file uploads
|
||||
3. **Set ACLs at directory level** and inherit to children
|
||||
4. **Use async client** for high-throughput scenarios
|
||||
5. **Use `get_paths` with `recursive=True`** for full directory listing
|
||||
6. **Set metadata** for custom file attributes
|
||||
7. **Consider Blob API** for simple object storage use cases
|
||||
Reference in New Issue
Block a user