--- license: apache-2.0 tags: - object-detection - yolo - edge-ai - quantization datasets: - MTID - Roboflow - VisDrone library_name: ultralytics --- # YOLO11 GhostConv + Knowledge Distillation + Quantization This notebook implements a complete model optimization pipeline for YOLO11 targeting edge devices, including: custom architecture with GhostConv, Knowledge Distillation, and Quantization. ## 📋 Table of Contents - [Overview](#overview) - [Notebook Structure](#notebook-structure) - [System Requirements](#system-requirements) - [Installation](#installation) - [Usage Guide](#usage-guide) - [Results](#results) - [References](#references) ## 🎯 Overview This notebook implements a 3-stage YOLO11 optimization pipeline: ### 1. Custom Architecture (YOLO11n-GhostConv) - Replace Conv layers with **GhostConv** to reduce parameters - Retain C3k2 and C2PSA blocks for feature extraction - Architecture optimized for traffic dataset (5 classes) ### 2. Knowledge Distillation (KD) - Teacher model: YOLO11l (large model) - Student model: YOLO11n-GhostConv (custom lightweight) - Techniques: - Feature-based distillation (MSE loss) - Logit-based distillation (KL divergence) - Temperature scaling (T=4.0) - Progressive KD with warmup epochs ### 3. Quantization - FP32 → INT8 quantization with TFLite - FP32 → FP16 quantization - Calibration dataset for INT8 - Performance comparison: FP32 vs INT8 vs FP16 ## 📁 Notebook Structure ### Section 1: Initialization - Mount Google Drive - Setup project directories - Import Ultralytics modules (GhostConv, C3k2, C2PSA) - Clone and install Ultralytics from source ### Section 2: Custom Architecture - Define YOLO11_TinyGhost architecture in YAML - Backbone with GhostConv layers - Head with Detect layer for 5 classes - Train baseline model (50 epochs) ### Section 3: Knowledge Distillation **Class implementations:** - `KDConfig`: Configuration for KD training - `KnowledgeDistillationTrainer`: Custom trainer inheriting from DetectionTrainer - Forward hooks to capture intermediate features - Feature distillation loss (normalized MSE) - Logit distillation loss (KL divergence with temperature) - Combined loss: `(1-α-β)*L_hard + α*L_feature + β*L_logit` **Training strategy:** - Warmup phase (8 epochs): hard loss only - After warmup: combine hard + KD losses - KD layers: ["model.4", "model.6", "model.10"] (P3, P4, PSA) - Hyperparameters: α=0.3, β=0.2, T=4.0 ### Section 4: Visualization - Training metrics plotting (mAP, loss curves) - F1 score tracking - Precision/Recall curves - Box/Class/DFL loss comparison ### Section 5: Fine-tuning - Load best KD checkpoint - Fine-tune on multi-view intersection dataset - Freeze 3 backbone layers - Low learning rate (1e-5) with cosine scheduler ### Section 6: Quantization **Export formats:** - INT8 TFLite (with calibration dataset) - FP16 TFLite **Evaluation:** - Compare mAP50 and mAP50-95 - FP32 vs INT8 vs FP16 - Image size: 416x416 ## 🔧 System Requirements ### Hardware - GPU: CUDA-compatible (T4 or better recommended) - RAM: 16GB+ - Storage: 10GB+ for datasets and models ### Software ``` Python >= 3.8 PyTorch >= 1.13 CUDA >= 11.3 Google Colab (recommended) ``` ## 📦 Installation ### 1. Clone Ultralytics from source ```bash !git clone https://github.com/ultralytics/ultralytics %cd ultralytics !pip install -e . ``` ### 2. Dependencies ```python pip install torch torchvision pip install matplotlib pandas pip install opencv-python pillow ``` ### 3. Dataset structure ``` dataset/ ├── images/ │ ├── train/ │ └── val/ ├── labels/ │ ├── train/ │ └── val/ └── data.yaml ``` ## 🚀 Usage Guide ### Step 1: Prepare Data ```python PROJECT_DIR = "/content/drive/MyDrive/yolo_ghostblock" DATASET_DIR = "/content/drive/MyDrive/dataset/yolo_mtid_motor/dataset" ``` ### Step 2: Train Baseline GhostConv Model ```python model = YOLO("yolo11_tinyghost.yaml") model.train( data=f"{DATASET_DIR}/data.yaml", epochs=50, imgsz=640, device=0 ) ``` ### Step 3: Knowledge Distillation ```python # Load teacher and student teacher = YOLO("path/to/teacher.pt") student = YOLO("path/to/student.pt") # Create KD trainer TrainerClass = create_kd_trainer_class( teacher_model=teacher, kd_alpha=0.3, kd_beta=0.2, kd_temperature=4.0, kd_layers=["model.4", "model.6", "model.10"] ) # Train with KD trainer = TrainerClass(overrides={...}) trainer.train() ``` ### Step 4: Quantization ```python # Export INT8 model.export( format="tflite", int8=True, data=CALIB_YAML, imgsz=416 ) # Evaluate quantized model model_int8 = YOLO("best_int8.tflite") metrics = model_int8.val(data=DATA_YAML, imgsz=416) ``` ## 📊 Results ### Model Comparison | Model | Parameters | Size | mAP50 | mAP50-95 | |-------|-----------|------|-------|----------| | YOLO11l (Teacher) | ~20M | ~40MB | 0.95+ | 0.80+ | | YOLO11n-Ghost | ~2M | ~4MB | 0.92+ | 0.75+ | | + KD | ~2M | ~4MB | 0.94+ | 0.78+ | | + INT8 | ~2M | ~1MB | 0.93+ | 0.76+ | ### Quantization Impact - **FP32 → INT8**: ~75% size reduction, ~1-2% mAP drop - **FP32 → FP16**: ~50% size reduction, ~0.5% mAP drop ### Training Curves - Box Loss: converges after ~30 epochs - mAP50: reaches plateau ~35-40 epochs - F1 Score: 0.85-0.90 range ## 📖 Technical Details ### GhostConv Architecture ```yaml backbone: - [-1, 1, GhostConv, [64, 3, 2]] - [-1, 1, GhostConv, [128, 3, 2]] - [-1, 1, C3k2, [256, False, 0.25]] ... ``` ### KD Loss Formula ``` L_total = (1 - α - β) * L_hard + α * L_feature + β * L_logit L_feature = MSE(normalize(S_feat), normalize(T_feat)) L_logit = KL(softmax(S/T), softmax(T/T)) * T² ``` ### Quantization Config - **INT8**: Post-training quantization with calibration - **Calibration**: 100-200 images from training set - **Input**: uint8 [0, 255] or float32 normalized ## ⚙️ Hyperparameters ### Training - **Epochs**: 40-50 - **Batch size**: 16 - **Image size**: 640x640 - **Learning rate**: 5e-5 (baseline), 1e-5 (fine-tune) - **Optimizer**: AdamW with cosine scheduler ### Knowledge Distillation - **α (feature)**: 0.3 - **β (logit)**: 0.2 - **Temperature**: 4.0 - **Warmup epochs**: 8 - **KD layers**: P3, P4, PSA output ### Quantization - **Format**: TFLite - **Input size**: 416x416 (edge deployment) - **Calibration samples**: 100 ## 🐛 Troubleshooting ### Issue 1: CUDA Out of Memory ```python # Reduce batch size batch = 8 # Enable mixed precision amp = True ``` ### Issue 2: Feature Shape Mismatch in KD - Check teacher and student architecture compatibility - Verify KD layer names match between models - Ensure input sizes are consistent ### Issue 3: INT8 Quantization Accuracy Drop - Increase number of calibration samples - Use representative dataset (diverse conditions) - Consider QAT (Quantization-Aware Training) ## 📚 References ### Papers - [YOLO11](https://docs.ultralytics.com/models/yolo11/) - [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907) - [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) - [Quantization and Training of Neural Networks](https://arxiv.org/abs/1806.08342) ### Resources - [Ultralytics Documentation](https://docs.ultralytics.com/) - [TFLite Quantization Guide](https://www.tensorflow.org/lite/performance/post_training_quantization) ## 🎯 Key Features ### Architecture Optimization - **GhostConv**: Reduces FLOPs by ~50% compared to standard convolutions - **Lightweight backbone**: Maintains accuracy while reducing parameters - **Flexible head**: Supports multiple detection tasks ### Knowledge Distillation - **Multi-level distillation**: Combines feature and logit knowledge transfer - **Temperature-scaled softmax**: Smooths probability distributions - **Progressive training**: Warmup phase for stable convergence ### Model Compression - **INT8 quantization**: 4x memory reduction - **FP16 quantization**: 2x memory reduction - **Edge-ready**: Optimized for mobile/embedded deployment ## 💡 Best Practices ### Training 1. Start with pre-trained weights when possible 2. Use data augmentation (mosaic, mixup, etc.) 3. Monitor validation metrics closely 4. Apply early stopping (patience=10-15) ### Knowledge Distillation 1. Ensure teacher model is well-trained (mAP > 90%) 2. Match batch normalization statistics 3. Use appropriate temperature (T=3-5 for object detection) 4. Gradually increase KD loss weight ### Quantization 1. Use diverse calibration dataset 2. Test on representative test set 3. Profile inference speed on target device 4. Consider hybrid quantization (some layers FP32) ## 📈 Performance Metrics ### Speed Benchmarks | Model | FP32 (ms) | FP16 (ms) | INT8 (ms) | Device | |-------|-----------|-----------|-----------|---------| | YOLO11l | 45 | 28 | N/A | T4 GPU | | YOLO11n-Ghost | 12 | 8 | N/A | T4 GPU | | INT8 TFLite | N/A | N/A | 25 | Edge TPU | ### Accuracy vs Efficiency - **YOLO11l**: Highest accuracy, largest model - **YOLO11n-Ghost**: Best accuracy/size trade-off - **+ KD**: Closes gap with teacher - **+ INT8**: Minimal accuracy loss, deployable ## 🔄 Workflow Summary ```mermaid graph LR A[YOLO11l Teacher] --> B[Design GhostConv Student] B --> C[Train Baseline] C --> D[Knowledge Distillation] D --> E[Fine-tune] E --> F[Quantize INT8/FP16] F --> G[Deploy to Edge] ``` ## 🚀 Deployment ### TFLite Conversion ```python # Export to TFLite INT8 model.export( format="tflite", int8=True, data="calibration.yaml", imgsz=416 ) ``` ### Inference Example ```python import numpy as np from PIL import Image # Load TFLite model interpreter = tf.lite.Interpreter(model_path="best_int8.tflite") interpreter.allocate_tensors() # Preprocess image img = Image.open("test.jpg").resize((416, 416)) input_data = np.array(img, dtype=np.uint8).reshape(1, 416, 416, 3) # Run inference interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output = interpreter.get_tensor(output_details[0]['index']) ``` ## 👥 Contributing Contributions are welcome! Areas for improvement: - Additional distillation techniques (attention transfer, etc.) - QAT implementation - More lightweight architectures - Deployment examples for different platforms ## 📄 License This notebook follows the Ultralytics AGPL-3.0 License. ## 🙏 Acknowledgments - [Ultralytics](https://ultralytics.com/) for YOLO11 framework - [GhostNet](https://github.com/huawei-noah/ghostnet) for efficient convolution design - Google Colab for compute resources --- **Note**: This notebook is designed to run on Google Colab with GPU runtime. Adjust paths and configurations for local environments as needed. **Last Updated**: January 2026 **Version**: v11 **Compatibility**: Ultralytics 8.0+