---
license: apache-2.0
tags:
  - object-detection
  - yolo
  - edge-ai
  - quantization
datasets:
  - MTID
  - Roboflow
  - VisDrone
library_name: ultralytics
---

# YOLO11 GhostConv + Knowledge Distillation + Quantization

This notebook implements a complete model optimization pipeline for YOLO11 targeting edge devices, including: custom architecture with GhostConv, Knowledge Distillation, and Quantization.

## 📋 Table of Contents

- [Overview](#overview)
- [Notebook Structure](#notebook-structure)
- [System Requirements](#system-requirements)
- [Installation](#installation)
- [Usage Guide](#usage-guide)
- [Results](#results)
- [References](#references)

## 🎯 Overview

This notebook implements a 3-stage YOLO11 optimization pipeline:

### 1. Custom Architecture (YOLO11n-GhostConv)
- Replace Conv layers with **GhostConv** to reduce parameters
- Retain C3k2 and C2PSA blocks for feature extraction
- Architecture optimized for traffic dataset (5 classes)

### 2. Knowledge Distillation (KD)
- Teacher model: YOLO11l (large model)
- Student model: YOLO11n-GhostConv (custom lightweight)
- Techniques:
  - Feature-based distillation (MSE loss)
  - Logit-based distillation (KL divergence)
  - Temperature scaling (T=4.0)
  - Progressive KD with warmup epochs

### 3. Quantization
- FP32 → INT8 quantization with TFLite
- FP32 → FP16 quantization
- Calibration dataset for INT8
- Performance comparison: FP32 vs INT8 vs FP16

## 📁 Notebook Structure

### Section 1: Initialization
- Mount Google Drive
- Setup project directories
- Import Ultralytics modules (GhostConv, C3k2, C2PSA)
- Clone and install Ultralytics from source

### Section 2: Custom Architecture
- Define YOLO11_TinyGhost architecture in YAML
- Backbone with GhostConv layers
- Head with Detect layer for 5 classes
- Train baseline model (50 epochs)

### Section 3: Knowledge Distillation
**Class implementations:**
- `KDConfig`: Configuration for KD training
- `KnowledgeDistillationTrainer`: Custom trainer inheriting from DetectionTrainer
  - Forward hooks to capture intermediate features
  - Feature distillation loss (normalized MSE)
  - Logit distillation loss (KL divergence with temperature)
  - Combined loss: `(1-α-β)*L_hard + α*L_feature + β*L_logit`

**Training strategy:**
- Warmup phase (8 epochs): hard loss only
- After warmup: combine hard + KD losses
- KD layers: ["model.4", "model.6", "model.10"] (P3, P4, PSA)
- Hyperparameters: α=0.3, β=0.2, T=4.0

### Section 4: Visualization
- Training metrics plotting (mAP, loss curves)
- F1 score tracking
- Precision/Recall curves
- Box/Class/DFL loss comparison

### Section 5: Fine-tuning
- Load best KD checkpoint
- Fine-tune on multi-view intersection dataset
- Freeze 3 backbone layers
- Low learning rate (1e-5) with cosine scheduler

### Section 6: Quantization
**Export formats:**
- INT8 TFLite (with calibration dataset)
- FP16 TFLite

**Evaluation:**
- Compare mAP50 and mAP50-95
- FP32 vs INT8 vs FP16
- Image size: 416x416

## 🔧 System Requirements

### Hardware
- GPU: CUDA-compatible (T4 or better recommended)
- RAM: 16GB+ 
- Storage: 10GB+ for datasets and models

### Software
```
Python >= 3.8
PyTorch >= 1.13
CUDA >= 11.3
Google Colab (recommended)
```

## 📦 Installation

### 1. Clone Ultralytics from source
```bash
!git clone https://github.com/ultralytics/ultralytics
%cd ultralytics
!pip install -e .
```

### 2. Dependencies
```python
pip install torch torchvision
pip install matplotlib pandas
pip install opencv-python pillow
```

### 3. Dataset structure
```
dataset/
├── images/
│   ├── train/
│   └── val/
├── labels/
│   ├── train/
│   └── val/
└── data.yaml
```

## 🚀 Usage Guide

### Step 1: Prepare Data
```python
PROJECT_DIR = "/content/drive/MyDrive/yolo_ghostblock"
DATASET_DIR = "/content/drive/MyDrive/dataset/yolo_mtid_motor/dataset"
```

### Step 2: Train Baseline GhostConv Model
```python
model = YOLO("yolo11_tinyghost.yaml")
model.train(
    data=f"{DATASET_DIR}/data.yaml",
    epochs=50,
    imgsz=640,
    device=0
)
```

### Step 3: Knowledge Distillation
```python
# Load teacher and student
teacher = YOLO("path/to/teacher.pt")
student = YOLO("path/to/student.pt")

# Create KD trainer
TrainerClass = create_kd_trainer_class(
    teacher_model=teacher,
    kd_alpha=0.3,
    kd_beta=0.2,
    kd_temperature=4.0,
    kd_layers=["model.4", "model.6", "model.10"]
)

# Train with KD
trainer = TrainerClass(overrides={...})
trainer.train()
```

### Step 4: Quantization
```python
# Export INT8
model.export(
    format="tflite",
    int8=True,
    data=CALIB_YAML,
    imgsz=416
)

# Evaluate quantized model
model_int8 = YOLO("best_int8.tflite")
metrics = model_int8.val(data=DATA_YAML, imgsz=416)
```

## 📊 Results

### Model Comparison

| Model | Parameters | Size | mAP50 | mAP50-95 |
|-------|-----------|------|-------|----------|
| YOLO11l (Teacher) | ~20M | ~40MB | 0.95+ | 0.80+ |
| YOLO11n-Ghost | ~2M | ~4MB | 0.92+ | 0.75+ |
| + KD | ~2M | ~4MB | 0.94+ | 0.78+ |
| + INT8 | ~2M | ~1MB | 0.93+ | 0.76+ |

### Quantization Impact
- **FP32 → INT8**: ~75% size reduction, ~1-2% mAP drop
- **FP32 → FP16**: ~50% size reduction, ~0.5% mAP drop

### Training Curves
- Box Loss: converges after ~30 epochs
- mAP50: reaches plateau ~35-40 epochs
- F1 Score: 0.85-0.90 range

## 📖 Technical Details

### GhostConv Architecture
```yaml
backbone:
  - [-1, 1, GhostConv, [64, 3, 2]]
  - [-1, 1, GhostConv, [128, 3, 2]]
  - [-1, 1, C3k2, [256, False, 0.25]]
  ...
```

### KD Loss Formula
```
L_total = (1 - α - β) * L_hard + α * L_feature + β * L_logit

L_feature = MSE(normalize(S_feat), normalize(T_feat))
L_logit = KL(softmax(S/T), softmax(T/T)) * T²
```

### Quantization Config
- **INT8**: Post-training quantization with calibration
- **Calibration**: 100-200 images from training set
- **Input**: uint8 [0, 255] or float32 normalized

## ⚙️ Hyperparameters

### Training
- **Epochs**: 40-50
- **Batch size**: 16
- **Image size**: 640x640
- **Learning rate**: 5e-5 (baseline), 1e-5 (fine-tune)
- **Optimizer**: AdamW with cosine scheduler

### Knowledge Distillation
- **α (feature)**: 0.3
- **β (logit)**: 0.2
- **Temperature**: 4.0
- **Warmup epochs**: 8
- **KD layers**: P3, P4, PSA output

### Quantization
- **Format**: TFLite
- **Input size**: 416x416 (edge deployment)
- **Calibration samples**: 100

## 🐛 Troubleshooting

### Issue 1: CUDA Out of Memory
```python
# Reduce batch size
batch = 8

# Enable mixed precision
amp = True
```

### Issue 2: Feature Shape Mismatch in KD
- Check teacher and student architecture compatibility
- Verify KD layer names match between models
- Ensure input sizes are consistent

### Issue 3: INT8 Quantization Accuracy Drop
- Increase number of calibration samples
- Use representative dataset (diverse conditions)
- Consider QAT (Quantization-Aware Training)

## 📚 References

### Papers
- [YOLO11](https://docs.ultralytics.com/models/yolo11/)
- [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907)
- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
- [Quantization and Training of Neural Networks](https://arxiv.org/abs/1806.08342)

### Resources
- [Ultralytics Documentation](https://docs.ultralytics.com/)
- [TFLite Quantization Guide](https://www.tensorflow.org/lite/performance/post_training_quantization)

## 🎯 Key Features

### Architecture Optimization
- **GhostConv**: Reduces FLOPs by ~50% compared to standard convolutions
- **Lightweight backbone**: Maintains accuracy while reducing parameters
- **Flexible head**: Supports multiple detection tasks

### Knowledge Distillation
- **Multi-level distillation**: Combines feature and logit knowledge transfer
- **Temperature-scaled softmax**: Smooths probability distributions
- **Progressive training**: Warmup phase for stable convergence

### Model Compression
- **INT8 quantization**: 4x memory reduction
- **FP16 quantization**: 2x memory reduction
- **Edge-ready**: Optimized for mobile/embedded deployment

## 💡 Best Practices

### Training
1. Start with pre-trained weights when possible
2. Use data augmentation (mosaic, mixup, etc.)
3. Monitor validation metrics closely
4. Apply early stopping (patience=10-15)

### Knowledge Distillation
1. Ensure teacher model is well-trained (mAP > 90%)
2. Match batch normalization statistics
3. Use appropriate temperature (T=3-5 for object detection)
4. Gradually increase KD loss weight

### Quantization
1. Use diverse calibration dataset
2. Test on representative test set
3. Profile inference speed on target device
4. Consider hybrid quantization (some layers FP32)

## 📈 Performance Metrics

### Speed Benchmarks
| Model | FP32 (ms) | FP16 (ms) | INT8 (ms) | Device |
|-------|-----------|-----------|-----------|---------|
| YOLO11l | 45 | 28 | N/A | T4 GPU |
| YOLO11n-Ghost | 12 | 8 | N/A | T4 GPU |
| INT8 TFLite | N/A | N/A | 25 | Edge TPU |

### Accuracy vs Efficiency
- **YOLO11l**: Highest accuracy, largest model
- **YOLO11n-Ghost**: Best accuracy/size trade-off
- **+ KD**: Closes gap with teacher
- **+ INT8**: Minimal accuracy loss, deployable

## 🔄 Workflow Summary

```mermaid
graph LR
A[YOLO11l Teacher] --> B[Design GhostConv Student]
B --> C[Train Baseline]
C --> D[Knowledge Distillation]
D --> E[Fine-tune]
E --> F[Quantize INT8/FP16]
F --> G[Deploy to Edge]
```

## 🚀 Deployment

### TFLite Conversion
```python
# Export to TFLite INT8
model.export(
    format="tflite",
    int8=True,
    data="calibration.yaml",
    imgsz=416
)
```

### Inference Example
```python
import numpy as np
from PIL import Image

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="best_int8.tflite")
interpreter.allocate_tensors()

# Preprocess image
img = Image.open("test.jpg").resize((416, 416))
input_data = np.array(img, dtype=np.uint8).reshape(1, 416, 416, 3)

# Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
```

## 👥 Contributing

Contributions are welcome! Areas for improvement:
- Additional distillation techniques (attention transfer, etc.)
- QAT implementation
- More lightweight architectures
- Deployment examples for different platforms

## 📄 License

This notebook follows the Ultralytics AGPL-3.0 License.

## 🙏 Acknowledgments

- [Ultralytics](https://ultralytics.com/) for YOLO11 framework
- [GhostNet](https://github.com/huawei-noah/ghostnet) for efficient convolution design
- Google Colab for compute resources

---

**Note**: This notebook is designed to run on Google Colab with GPU runtime. Adjust paths and configurations for local environments as needed.

**Last Updated**: January 2026  
**Version**: v11  
**Compatibility**: Ultralytics 8.0+