Mini-Vision-V2: MNIST Handwritten Digit Classifier

Model Size Accuracy

Welcome to Mini-Vision-V2, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.

Model Description

Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only 0.82M parameters (significantly smaller than V1), it achieves 99.3% accuracy on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.

  • Dataset: MNIST (28x28 grayscale images, 10 classes)
  • Framework: PyTorch
  • Total Parameters: 0.82M

Model Architecture

The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.

Layer Input Channels Output Channels Kernel Size Stride Padding Activation Other
Conv Block 1 1 32 3 1 1 ReLU MaxPool(2), BatchNorm
Conv Block 2 32 64 3 1 1 ReLU MaxPool(2), BatchNorm
Flatten - - - - - - Output: 3136
Linear 1 3136 256 - - - ReLU Dropout(0.3)
Linear 2 256 10 - - - - -

Training Strategy

The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.

  • Optimizer: SGD (Momentum=0.8)
  • Initial Learning Rate: 0.01
  • Scheduler: StepLR (Step size=3, Gamma=0.5)
  • Loss Function: CrossEntropyLoss
  • Batch Size: 256
  • Epochs: 40 (Best model)
  • Data Augmentation:
    • Random Crop (28x28 with padding=2)
    • Random Rotation (10 degrees)

Performance

The model achieved outstanding results on the MNIST test set:

Metric Value
Test Accuracy 99.3%
Test Loss 0.0235
Train Loss 0.0615
Parameters 0.82M

Training Visualization (TensorBoard)

Below are the training and testing curves visualized via TensorBoard.

1. Training Loss

Training Loss (Recorded every epoch)

2. Test Loss & Accuracy

Test Loss (Recorded every epoch)

Quick Start

Dependencies

  • Python 3.x
  • PyTorch
  • Torchvision
  • Gradio (for demo)
  • Datasets

Inference / Web Demo

Run the Gradio demo to draw numbers and see predictions in real-time:

python demo.py

File Structure

.
β”œβ”€β”€ model.py               # Model architecture definition
β”œβ”€β”€ train.py               # Training script
β”œβ”€β”€ demo.py                # Gradio Web Interface
β”œβ”€β”€ Mini-Vision-V2.pth     # Trained model weights
β”œβ”€β”€ config.json
β”œβ”€β”€ README.md
└── assets
      β”œβ”€β”€ train_loss.png   # Visualized train loss graph
      └── test_loss.png    # Visualized test loss graph

License

This project is licensed under the MIT License.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train LWWZH/Mini-Vision-V2

Collection including LWWZH/Mini-Vision-V2