Mini-Vision-V2: MNIST Handwritten Digit Classifier
Welcome to Mini-Vision-V2, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.
Model Description
Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only 0.82M parameters (significantly smaller than V1), it achieves 99.3% accuracy on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.
- Dataset: MNIST (28x28 grayscale images, 10 classes)
- Framework: PyTorch
- Total Parameters: 0.82M
Model Architecture
The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.
| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
|---|---|---|---|---|---|---|---|
| Conv Block 1 | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| Conv Block 2 | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| Flatten | - | - | - | - | - | - | Output: 3136 |
| Linear 1 | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
| Linear 2 | 256 | 10 | - | - | - | - | - |
Training Strategy
The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.
- Optimizer: SGD (Momentum=0.8)
- Initial Learning Rate: 0.01
- Scheduler: StepLR (Step size=3, Gamma=0.5)
- Loss Function: CrossEntropyLoss
- Batch Size: 256
- Epochs: 40 (Best model)
- Data Augmentation:
- Random Crop (28x28 with padding=2)
- Random Rotation (10 degrees)
Performance
The model achieved outstanding results on the MNIST test set:
| Metric | Value |
|---|---|
| Test Accuracy | 99.3% |
| Test Loss | 0.0235 |
| Train Loss | 0.0615 |
| Parameters | 0.82M |
Training Visualization (TensorBoard)
Below are the training and testing curves visualized via TensorBoard.
1. Training Loss
2. Test Loss & Accuracy
Quick Start
Dependencies
- Python 3.x
- PyTorch
- Torchvision
- Gradio (for demo)
- Datasets
Inference / Web Demo
Run the Gradio demo to draw numbers and see predictions in real-time:
python demo.py
File Structure
.
βββ model.py # Model architecture definition
βββ train.py # Training script
βββ demo.py # Gradio Web Interface
βββ Mini-Vision-V2.pth # Trained model weights
βββ config.json
βββ README.md
βββ assets
βββ train_loss.png # Visualized train loss graph
βββ test_loss.png # Visualized test loss graph
License
This project is licensed under the MIT License.
- Downloads last month
- 19

