firefrost-gaming/antigravity-skills-reference

Files

sickn33 bdcfbb9625 feat(hugging-face): Add official ecosystem skills

Import the official Hugging Face ecosystem skills and sync the\nexisting local coverage with upstream metadata and assets.\n\nRegenerate the canonical catalog, plugin mirrors, docs, and release\nnotes after the maintainer merge batch so main stays in sync.\n\nFixes #417

2026-03-29 18:31:46 +02:00

5.5 KiB

Raw Blame History

Supported Model Architectures

This document lists the model architectures currently supported by Transformers.js.

Natural Language Processing

Text Models

ALBERT - A Lite BERT for Self-supervised Learning
BERT - Bidirectional Encoder Representations from Transformers
CamemBERT - French language model based on RoBERTa
CodeGen - Code generation models
CodeLlama - Code-focused Llama models
Cohere - Command-R models for RAG
DeBERTa - Decoding-enhanced BERT with Disentangled Attention
DeBERTa-v2 - Improved version of DeBERTa
DistilBERT - Distilled version of BERT (smaller, faster)
GPT-2 - Generative Pre-trained Transformer 2
GPT-Neo - Open source GPT-3 alternative
GPT-NeoX - Larger GPT-Neo models
LLaMA - Large Language Model Meta AI
Mistral - Mistral AI language models
MPNet - Masked and Permuted Pre-training
MobileBERT - Compressed BERT for mobile devices
RoBERTa - Robustly Optimized BERT
T5 - Text-to-Text Transfer Transformer
XLM-RoBERTa - Multilingual RoBERTa

Sequence-to-Sequence

BART - Denoising Sequence-to-Sequence Pre-training
Blenderbot - Open-domain chatbot
BlenderbotSmall - Smaller Blenderbot variant
M2M100 - Many-to-Many multilingual translation
MarianMT - Neural machine translation
mBART - Multilingual BART
NLLB - No Language Left Behind (200 languages)
Pegasus - Pre-training with extracted gap-sentences

Computer Vision

Image Classification

BEiT - BERT Pre-Training of Image Transformers
ConvNeXT - Modern ConvNet architecture
ConvNeXTV2 - Improved ConvNeXT
DeiT - Data-efficient Image Transformers
DINOv2 - Self-supervised Vision Transformer
DINOv3 - Latest DINO iteration
EfficientNet - Efficient convolutional networks
MobileNet - Lightweight models for mobile
MobileViT - Mobile Vision Transformer
ResNet - Residual Networks
SegFormer - Semantic segmentation transformer
Swin - Shifted Window Transformer
ViT - Vision Transformer

Object Detection

DETR - Detection Transformer
D-FINE - Fine-grained Distribution Refinement for object detection
DINO - DETR with Improved deNoising anchOr boxes
Grounding DINO - Open-set object detection
YOLOS - You Only Look at One Sequence

Segmentation

CLIPSeg - Image segmentation with text prompts
Mask2Former - Universal image segmentation
SAM - Segment Anything Model
EdgeTAM - On-Device Track Anything Model

Depth & Pose

DPT - Dense Prediction Transformer
Depth Anything - Monocular depth estimation
Depth Pro - Sharp monocular metric depth
GLPN - Global-Local Path Networks for depth

Audio

Speech Recognition

Wav2Vec2 - Self-supervised speech representations
Whisper - Robust speech recognition (multilingual)
HuBERT - Self-supervised speech representation learning

Audio Processing

Audio Spectrogram Transformer - Audio classification
DAC - Descript Audio Codec

Text-to-Speech

SpeechT5 - Unified speech and text pre-training
VITS - Conditional Variational Autoencoder with adversarial learning

Multimodal

Vision-Language

CLIP - Contrastive Language-Image Pre-training
Chinese-CLIP - Chinese version of CLIP
ALIGN - Large-scale noisy image-text pairs
BLIP - Bootstrapping Language-Image Pre-training
Florence-2 - Unified vision foundation model
LLaVA - Large Language and Vision Assistant
Moondream - Tiny vision-language model

Document Understanding

DiT - Document Image Transformer
Donut - OCR-free Document Understanding
LayoutLM - Pre-training for document understanding
TrOCR - Transformer-based OCR

Audio-Language

CLAP - Contrastive Language-Audio Pre-training

Embeddings & Similarity

Sentence Transformers - Sentence embeddings
all-MiniLM - Efficient sentence embeddings
all-mpnet-base - High-quality sentence embeddings
E5 - Text embeddings by Microsoft
BGE - General embedding models
nomic-embed - Long context embeddings

Specialized Models

Code

CodeBERT - Pre-trained model for code
GraphCodeBERT - Code structure understanding
StarCoder - Code generation

Scientific

SciBERT - Scientific text
BioBERT - Biomedical text

Retrieval

ColBERT - Contextualized late interaction over BERT
DPR - Dense Passage Retrieval

Model Selection Tips

For Text Tasks

Small & Fast: DistilBERT, MobileBERT
Balanced: BERT-base, RoBERTa-base
High Accuracy: RoBERTa-large, DeBERTa-v3-large
Multilingual: XLM-RoBERTa, mBERT

For Vision Tasks

Mobile/Browser: MobileNet, EfficientNet-B0
Balanced: DeiT-base, ConvNeXT-tiny
High Accuracy: Swin-large, DINOv2-large

For Audio Tasks

Speech Recognition: Whisper-tiny (fast), Whisper-large (accurate)
Audio Classification: Audio Spectrogram Transformer

For Multimodal

Vision-Language: CLIP (general), Florence-2 (comprehensive)
Document AI: Donut, LayoutLM
OCR: TrOCR

Finding Models on Hugging Face Hub

Search for compatible models:

https://huggingface.co/models?library=transformers.js

Filter by task:

https://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js

Check for ONNX support by looking for onnx/ folder in model repository.

5.5 KiB Raw Blame History