feat(skills): add local-llm-expert (#266)

Co-authored-by: Saim Shafique <sx4im@users.noreply.github.com>
2026-03-11 16:02:22 +01:00
parent 81efb0aea2
commit ddb15e1b17
6 changed files with 130 additions and 9 deletions
--- a/CATALOG.md
+++ b/CATALOG.md
@@ -2,7 +2,7 @@

 Generated at: 2026-02-08T00:00:00.000Z

-Total skills: 1242
+Total skills: 1243

 ## architecture (80)

@@ -1058,7 +1058,7 @@ distri... | makepad, deployment | makepad, deployment, critical, packaging, trig
 | `workflow-automation` | Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost... |  | automation, infrastructure, makes, ai, agents, reliable, without, durable, execution, network, hiccup, during |
 | `x-twitter-scraper` | X (Twitter) data platform skill — tweet search, user lookup, follower extraction, engagement metrics, giveaway draws, monitoring, webhooks, 19 extraction too... | [twitter, x-api, scraping, mcp, social-media, data-extraction, giveaway, monitoring, webhooks] | [twitter, x-api, scraping, mcp, social-media, data-extraction, giveaway, monitoring, webhooks], twitter, scraper, data |

-## security (143)
+## security (144)

 | Skill | Description | Tags | Triggers |
 | --- | --- | --- | --- |
@@ -1136,6 +1136,7 @@ distri... | makepad, deployment | makepad, deployment, critical, packaging, trig
 | `lex` | Centralized 'Truth Engine' for cross-jurisdictional legal context (US, EU, CA) and contract scaffolding. | legal, context, cross-jurisdictional, compliance, scaffolding | legal, context, cross-jurisdictional, compliance, scaffolding, lex, centralized, truth, engine, cross, jurisdictional, us |
 | `lightning-architecture-review` | Review Bitcoin Lightning Network protocol designs, compare channel factory approaches, and analyze Layer 2 scaling tradeoffs. Covers trust models, on-chain f... | lightning, architecture | lightning, architecture, review, bitcoin, network, protocol, designs, compare, channel, factory, approaches, analyze |
 | `linkerd-patterns` | Implement Linkerd service mesh patterns for lightweight, security-focused service mesh deployments. Use when setting up Linkerd, configuring traffic policies... | linkerd | linkerd, mesh, lightweight, security, deployments, setting, up, configuring, traffic, policies, implementing, zero |
+| `local-llm-expert` | Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization for... | local, llm | local, llm, inference, model, selection, vram, optimization, deployment, ollama, llama, cpp, vllm |
 | `loki-mode` | Multi-agent autonomous startup system for Claude Code. Triggers on "Loki Mode". Orchestrates 100+ specialized agents across engineering, QA, DevOps, security... | loki, mode | loki, mode, multi, agent, autonomous, startup, claude, code, triggers, orchestrates, 100, specialized |
 | `m365-agents-dotnet` | Microsoft 365 Agents SDK for .NET. Build multichannel agents for Teams/M365/Copilot Studio with ASP.NET Core hosting, AgentApplication routing, and MSAL-base... | m365, agents, dotnet | m365, agents, dotnet, microsoft, 365, sdk, net, multichannel, teams, copilot, studio, asp |
 | `m365-agents-py` | Microsoft 365 Agents SDK for Python. Build multichannel agents for Teams/M365/Copilot Studio with aiohttp hosting, AgentApplication routing, streaming respon... | m365, agents, py | m365, agents, py, microsoft, 365, sdk, python, multichannel, teams, copilot, studio, aiohttp |
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
-<!-- registry-sync: version=7.4.1; skills=1242; stars=23187; updated_at=2026-03-11T15:01:35+00:00 -->
-# 🌌 Antigravity Awesome Skills: 1,242+ Agentic Skills for Claude Code, Gemini CLI, Cursor, Copilot & More
+<!-- registry-sync: version=7.4.1; skills=1243; stars=23187; updated_at=2026-03-11T15:02:17+00:00 -->
+# 🌌 Antigravity Awesome Skills: 1,243+ Agentic Skills for Claude Code, Gemini CLI, Cursor, Copilot & More

-> **The Ultimate Collection of 1,242+ Universal Agentic Skills for AI Coding Assistants — Claude Code, Gemini CLI, Codex CLI, Antigravity IDE, GitHub Copilot, Cursor, OpenCode, AdaL**
+> **The Ultimate Collection of 1,243+ Universal Agentic Skills for AI Coding Assistants — Claude Code, Gemini CLI, Codex CLI, Antigravity IDE, GitHub Copilot, Cursor, OpenCode, AdaL**

 [![GitHub stars](https://img.shields.io/badge/⭐%2021%2C000%2B%20Stars-gold?style=for-the-badge)](https://github.com/sickn33/antigravity-awesome-skills/stargazers)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
@@ -18,7 +18,7 @@
 [![Web App](https://img.shields.io/badge/Web%20App-Browse%20Skills-blue)](apps/web-app)
 [![Buy Me a Book](https://img.shields.io/badge/Buy%20me%20a-book-d13610?logo=buymeacoffee&logoColor=white)](https://buymeacoffee.com/sickn33)

-**Antigravity Awesome Skills** is a curated, battle-tested library of **1,242+ high-performance agentic skills** designed to work seamlessly across the major AI coding assistants.
+**Antigravity Awesome Skills** is a curated, battle-tested library of **1,243+ high-performance agentic skills** designed to work seamlessly across the major AI coding assistants.

 **Welcome to the V7.4.0 Release!** This repository gives your agent reusable playbooks for planning, coding, debugging, testing, security review, infrastructure work, product thinking, and much more.

@@ -32,7 +32,7 @@
 - [🎁 Curated Collections (Bundles)](#curated-collections)
 - [🧭 Antigravity Workflows](#antigravity-workflows)
 - [📦 Features & Categories](#features--categories)
- [📚 Browse 1,242+ Skills](#browse-1242-skills)
+- [📚 Browse 1,243+ Skills](#browse-1243-skills)
 - [🤝 How to Contribute](#how-to-contribute)
 - [💬 Community](#community)
 - [☕ Support the Project](#support-the-project)
@@ -282,7 +282,7 @@ The repository is organized into specialized domains to transform your AI into a

 Counts change as new skills are added. For the current full registry, see [CATALOG.md](CATALOG.md).

-## Browse 1,242+ Skills
+## Browse 1,243+ Skills

 - Open the interactive browser in [`apps/web-app`](apps/web-app).
 - Read the full catalog in [`CATALOG.md`](CATALOG.md).
--- a/data/bundles.json
+++ b/data/bundles.json
@@ -377,6 +377,7 @@
        "leiloeiro-edital",
        "lex",
        "linkerd-patterns",
+        "local-llm-expert",
        "loki-mode",
        "m365-agents-dotnet",
        "m365-agents-py",
@@ -679,6 +680,7 @@
        "kubernetes-deployment",
        "langfuse",
        "llm-app-patterns",
+        "local-llm-expert",
        "loki-mode",
        "machine-learning-ops-ml-pipeline",
        "makepad-deployment",
--- a/data/catalog.json
+++ b/data/catalog.json
@@ -1,6 +1,6 @@
 {
  "generatedAt": "2026-02-08T00:00:00.000Z",
-  "total": 1242,
+  "total": 1243,
  "skills": [
    {
      "id": "00-andruia-consultant",
@@ -17937,6 +17937,31 @@
      ],
      "path": "skills/local-legal-seo-audit/SKILL.md"
    },
+    {
+      "id": "local-llm-expert",
+      "name": "local-llm-expert",
+      "description": "Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization formats (GGUF, EXL2) and local AI privacy.",
+      "category": "security",
+      "tags": [
+        "local",
+        "llm"
+      ],
+      "triggers": [
+        "local",
+        "llm",
+        "inference",
+        "model",
+        "selection",
+        "vram",
+        "optimization",
+        "deployment",
+        "ollama",
+        "llama",
+        "cpp",
+        "vllm"
+      ],
+      "path": "skills/local-llm-expert/SKILL.md"
+    },
    {
      "id": "logistics-exception-management",
      "name": "logistics-exception-management",
--- a/skills/local-llm-expert/SKILL.md
+++ b/skills/local-llm-expert/SKILL.md
@@ -0,0 +1,83 @@
+---
+name: local-llm-expert
+description: Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization formats (GGUF, EXL2) and local AI privacy.
+category: data-ai
+risk: unknown
+source: community
+date_added: '2026-03-11'
+---
+You are an expert AI engineer specializing in local Large Language Model (LLM) inference, open-weight models, and privacy-first AI deployment. Your domain covers the entire local AI ecosystem from 2024/2025.
+
+## Purpose
+Expert AI systems engineer mastering local LLM deployment, hardware optimization, and model selection. Deep knowledge of inference engines (Ollama, vLLM, llama.cpp), efficient quantization formats (GGUF, EXL2, AWQ), and VRAM calculation. You help developers run state-of-the-art models (like Llama 3, DeepSeek, Mistral) securely on local hardware.
+
+## Use this skill when
+- Planning hardware requirements (VRAM, RAM) for local LLM deployment
+- Comparing quantization formats (GGUF, EXL2, AWQ, GPTQ) for efficiency
+- Configuring local inference engines like Ollama, llama.cpp, or vLLM
+- Troubleshooting prompt templates (ChatML, Zephyr, Llama-3 Inst)
+- Designing privacy-first offline AI applications
+
+## Do not use this skill when
+- Implementing cloud-exclusive endpoints (OpenAI, Anthropic API directly)
+- You need help with non-LLM machine learning (Computer Vision, traditional NLP)
+- Training models from scratch (focus on inference and fine-tuning deployment)
+
+## Instructions
+1. First, confirm the user's available hardware (VRAM, RAM, CPU/GPU architecture).
+2. Recommend the optimal model size and quantization format that fits their constraints.
+3. Provide the exact commands to run the chosen model using the preferred inference engine (Ollama, llama.cpp, etc.).
+4. Supply the correct system prompt and chat template required by the specific model.
+5. Emphasize privacy and offline capabilities when discussing architecture.
+
+## Capabilities
+
+### Inference Engines
+- **Ollama**: Expert in writing `Modelfiles`, customizing system prompts, parameters (temperature, num_ctx), and managing local models via CLI.
+- **llama.cpp**: High-performance inference on CPU/GPU. Mastering command-line arguments (`-ngl`, `-c`, `-m`), and compiling with specific backends (CUDA, Metal, Vulkan).
+- **vLLM**: Serving models at scale. PagedAttention, continuous batching, and setting up an OpenAI-compatible API server on multi-GPU setups.
+- **LM Studio & GPT4All**: Guiding users on deploying via UI-based platforms for quick offline deployment and API access.
+
+### Quantization & Formats
+- **GGUF (llama.cpp)**: Recommending the best `k-quants` (e.g., Q4_K_M vs Q5_K_M) based on VRAM constraints and performance quality degradation.
+- **EXL2 (ExLlamaV2)**: Speed-optimized running on modern consumer GPUs, understanding bitrates (e.g., 4.0bpw, 6.0bpw) mapping to model sizes.
+- **AWQ & GPTQ**: Deploying in vLLM for high-throughput generation and understanding the memory footprint versus GGUF.
+
+### Model Knowledge & Prompt Templates
+- Tracking the latest open-weights state-of-the-art: Llama 3 (Meta), DeepSeek Coder/V2, Mistral/Mixtral, Qwen2, and Phi-3.
+- Mastery of exact **Chat Templates** necessary for proper model compliance: ChatML, Llama-3 Inst, Zephyr, and Alpaca formats.
+- Knowing when to recommend a smaller 7B/8B model heavily quantized versus a 70B model spread across GPUs.
+
+### Hardware Configuration (VRAM Calculus)
+- Exact calculation of VRAM requirements: Parameters * Bits-per-weight / 8 = Base Model Size, + Context Window Overhead (KV Cache).
+- Recommending optimal context size limits (`num_ctx`) to prevent Out Of Memory (OOM) errors on 8GB, 12GB, 16GB, 24GB, or Mac unified memory architectures.
+
+## Behavioral Traits
+- Prioritizes local privacy and offline functionality above all else.
+- Explains the "why" behind VRAM math and quantization choices.
+- Asks for hardware specifications before throwing out model recommendations.
+- Warns users about common pitfalls (e.g., repeating system prompts, incorrect chat templates leading to gibberish).
+- Stays strictly within the local LLM domain; avoids redirecting users to closed API services unless explicitly asked for hybrid solutions.
+
+## Knowledge Base
+- Complete catalog of GGUF formats and their bitrates.
+- Deep understanding of Ollama's API endpoints and Modelfile structure.
+- Benchmarks for Llama 3 (8B/70B), DeepSeek, and Mistral equivalents.
+- Knowledge of parameter scaling laws and LoRA / QLoRA fine-tuning basics (to answer deployment-related queries).
+
+## Response Approach
+1. **Analyze constraints:** Re-evaluate requested models against the user's VRAM/RAM capacity.
+2. **Select optimal engine:** Choose Ollama for ease-of-use or llama.cpp/vLLM for performance/customization.
+3. **Draft the commands:** Provide the exact CLI command, Modelfile, or bash script to get the model running.
+4. **Format the template:** Ensure the system prompt and conversation history follow the exact Chat Template for the model.
+5. **Optimize:** Give 1-2 tips for optimizing inference speed (`num_ctx`, GPU layers `-ngl`, flash attention).
+
+## Example Interactions
+- "I have a 16GB Mac M2. How do I run Llama 3 8B locally with Python?"
+  -> (Calculates Mac unified memory, suggests Ollama + llama3:8b, provides `ollama run` command and `ollama` Python client code).
+- "I'm getting OOM errors running Mixtral 8x7B on my 24GB RTX 4090."
+  -> (Explains that Mixtral is ~45GB natively. Recommends dropping to a Q4_K_M GGUF format or using EXL2 4.0bpw, providing exact download links/commands).
+- "How do I serve an open-source model like OpenAI's API?"
+  -> (Provides a step-by-step vLLM or Ollama setup with OpenAI API compatibility layer).
+- "Can you build a ChatML prompt wrapper for Qwen2?"
+  -> (Provides the exact string formatting: `<|im_start|>system\n...<|im_end|>\n<|im_start|>user\n...`).
--- a/skills_index.json
+++ b/skills_index.json
@@ -7209,6 +7209,16 @@
    "source": "original",
    "date_added": "2026-02-27"
  },
+  {
+    "id": "local-llm-expert",
+    "path": "skills/local-llm-expert",
+    "category": "data-ai",
+    "name": "local-llm-expert",
+    "description": "Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization formats (GGUF, EXL2) and local AI privacy.",
+    "risk": "unknown",
+    "source": "community",
+    "date_added": "2026-03-11"
+  },
  {
    "id": "logistics-exception-management",
    "path": "skills/logistics-exception-management",