Mini-Vision-V2: MNIST Handwritten Digit Classifier

Welcome to Mini-Vision-V2, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.

Model Description

Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only 0.82M parameters (significantly smaller than V1), it achieves 99.3% accuracy on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.

Dataset: MNIST (28x28 grayscale images, 10 classes)
Framework: PyTorch
Total Parameters: 0.82M

Model Architecture

The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.

Layer	Input Channels	Output Channels	Kernel Size	Stride	Padding	Activation	Other
Conv Block 1	1	32	3	1	1	ReLU	MaxPool(2), BatchNorm
Conv Block 2	32	64	3	1	1	ReLU	MaxPool(2), BatchNorm
Flatten	-	-	-	-	-	-	Output: 3136
Linear 1	3136	256	-	-	-	ReLU	Dropout(0.3)
Linear 2	256	10	-	-	-	-	-

Training Strategy

The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.

Optimizer: SGD (Momentum=0.8)
Initial Learning Rate: 0.01
Scheduler: StepLR (Step size=3, Gamma=0.5)
Loss Function: CrossEntropyLoss
Batch Size: 256
Epochs: 40 (Best model)
Data Augmentation:
- Random Crop (28x28 with padding=2)
- Random Rotation (10 degrees)

Performance

The model achieved outstanding results on the MNIST test set:

Metric	Value
Test Accuracy	99.3%
Test Loss	0.0235
Train Loss	0.0615
Parameters	0.82M

Training Visualization (TensorBoard)

Below are the training and testing curves visualized via TensorBoard.

1. Training Loss

(Recorded every epoch)

2. Test Loss & Accuracy

(Recorded every epoch)

Quick Start

Dependencies

Python 3.x
PyTorch
Torchvision
Gradio (for demo)
Datasets

Inference / Web Demo

Run the Gradio demo to draw numbers and see predictions in real-time:

python demo.py

File Structure

.
├── model.py               # Model architecture definition
├── train.py               # Training script
├── demo.py                # Gradio Web Interface
├── Mini-Vision-V2.pth     # Trained model weights
├── config.json
├── README.md
└── assets
      ├── train_loss.png   # Visualized train loss graph
      └── test_loss.png    # Visualized test loss graph

License

This project is licensed under the MIT License.

Downloads last month: 19

Dataset used to train LWWZH/Mini-Vision-V2

Collection including LWWZH/Mini-Vision-V2

Mini-Vision-Series

Collection

2 items • Updated 2 days ago • 1