Hugging Face Transformers
Hugging Face Transformers is currently the most popular open-source NLP/AI library, providing thousands of pre-trained models covering almost all AI tasks including text, images, audio, and multimodal.
Its core value: encapsulating complex model loading, inference, and training workflows into just a few lines of code.
### Supported Task Types
* * *
## Core Principles of Transformer Architecture
Before using the library, understanding the underlying architecture will help you understand why parameters are tuned this way.
### Overall Architecture: Encoder-Decoder
!(https://example.com/wp-content/uploads/2026/05/tutorial-cf08ae6f-a6fc-4a1b-8977-364df.png)
### Three Major Model Families
* * *
## Installation and Environment Setup
### Installation
# Basic installation
pip install transformers
# Full installation (includes training dependencies)
pip install transformers
# PyTorch backend (recommended)
pip install transformers
# TensorFlow backend
pip install transformers
# JAX/Flax backend
# Common companion libraries
pip install datasets
# HuggingFace datasets library
pip install evaluate
# Model evaluation metrics
pip install accelerate
# Multi-GPU/mixed precision training
pip install peft
# Parameter-efficient fine-tuning (LoRA, etc.)
pip install tokenizers
# High-performance tokenizer
pip install sentencepiece
# Required for some models (T5/LLaMA)
# Verify installation
python -c "import transformers; print(transformers.__version__)"
### Environment Variable Configuration
# Set model cache directory (models are cached here after download, default ~/.cache/huggingface)
export HF_HOME=/data/huggingface_cache
# For Chinese users: Use mirror site to speed up downloads (recommended hf-mirror.com)
export HF_ENDPOINT=https://hf-mirror.com
# Offline mode (when network is unavailable, only use cached models)
export TRANSFORMERS_OFFLINE=1
# Disable progress bar (CI/CD environment)
export DISABLE_TQDM=1
## Examples
# Can also be set in code
import os
os.environ="https://hf-mirror.com"
# View current cache directory
from transformers.utils import TRANSFORMERS_CACHE
print(TRANSFORMERS_CACHE)
* * *
## Pipeline: Run AI in Five Lines of Code
Pipeline is the highest-level abstraction in Transformers, encapsulating model loading, preprocessing, inference, and post-processing. Inference can be completed in just three to five lines of code.
### Pipeline Quick Examples Collection
## Examples
from transformers import pipeline
# 1. Sentiment analysis (text classification)
classifier = pipeline("sentiment-analysis")
result = classifier("I love using Hugging Face Transformers!")
# -> [{'label': 'POSITIVE', 'score': 0.9998}]
# 2. Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("Once upon a time in a land far away,",
max_new_tokens=50, num_return_sequences=1, temperature=0.8)
# 3. Fill in the blank (masked language model)
unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("The capital of France is .")
# -> [{'token_str': 'paris', 'score': 0.9823}, ...]
# 4. Named Entity Recognition (NER)
ner = pipeline("ner", aggregation_strategy="simple")
result = ner("My name is John and I work at Google in New York.")
# -> [{'entity_group': 'PER', 'word': 'John', 'score': 0.998}, ...]
# 5. Extractive QA
qa = pipeline("question-answering")
result = qa(question="Who invented Python?",
context="Python was created by Guido van Rossum in 1991.")
# -> {'answer': 'Guido van Rossum', 'score': 0.9887}
# 6. Text summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
result = summarizer(article, max_length=60, min_length=20)
# 7. Machine translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")
result = translator("Hello, how are you today?")
# -> [{'translation_text': 'Hello, how are you today?'}]
# 8. Zero-shot classification (no specialized training required)
zero_shot = pipeline("zero-shot-classification")
result = zero_shot("I love playing football",
candidate_labels=["sports","politics","technology"])
# -> {'labels': ['sports', ...], 'scores': [0.972, ...]}
### Advanced Pipeline Configuration
## Examples
import torch
from transformers import pipeline
# Specify GPU
pipe = pipeline("text-generation", model="gpt2", device=0)
# Specify precision (save VRAM)
pipe = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf",
torch_dtype=torch.float16, device_map="auto")
# Batch processing (improve throughput)
pipe = pipeline("sentiment-analysis", batch_size=32)
results = pipe(large_text_list)
# Automatic batched inference
# Large text chunking
asr = pipeline("automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30, stride_length_s=5)
result = asr("long_audio.wav", return_timestamps=True)
* * *
## Deep Dive into Tokenizer
Tokenizer is the first step in NLP: converting raw text into a sequence of numbers that the model can understand.
### Complete Tokenization Workflow
### Core Tokenizer Usage
## Examples
from transformers import AutoTokenizer
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# One-step encoding
encoding = tokenizer(
"Hello, I'm learning Transformers!",
return_tensors="pt",# Return PyTorch tensor
padding=True,# Pad to longest sequence
truncation=True,# Truncate when exceeding length
max_length=128,# Maximum length
)
print(encoding.keys())
# -> dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
print(encoding[:8])
# -> tensor([101, 7592, 1010, 1045, 1005, 1049, 4083, 19081])
print(encoding[:8])
# -> tensor([1, 1, 1, 1, 1, 1, 1, 1]) # 1=real token, 0=padding
# Decode (ID -> text)
decoded = tokenizer.decode(encoding, skip_special_tokens=True)
print(decoded)# -> "hello, i'm learning transformers!"
# Batch encoding (automatic padding alignment)
texts =["Short.","This is a much longer sentence for testing."]
batch = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
print(batch.shape)# -> torch.Size([2, 10])
# Vocabulary information
print(f"Vocabulary size: {tokenizer.vocab_size}")# -> 30522
print(f" ID: {tokenizer.cls_token_id}")# -> 101
print(f" ID: {tokenizer.sep_token_id}")# -> 102
print(f"Max length: {tokenizer.model_max_length}")# -> 512
### Common Tokenizer Type Comparison
* * *
## Model Loading and Inference
### AutoClass: Automatically Select the Correct Model Class
## Examples
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name ="bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
model_name, num_labels=2, torch_dtype=torch.float16, device_map="auto"
)
# Manual inference complete workflow
text ="Transformers is an amazing library!"
# 1. Encode
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs ={k: v.to(model.device)for k, v in inputs.items()}
# 2. Forward pass
with torch.no_grad():
outputs = model(**inputs)
# 3. Parse output
logits = outputs.logits# shape: [1, 2]
probs = torch.softmax(logits, dim=-1)
pred = torch.argmax(probs, dim=-1).item()
id2label = model.config.id2label# {0: 'LABEL_0
YouTip