anbekalanNet (QuartzNet 15x5 char CTC Series) — [LEGACY]

Model architecture | Model size | Language

anbekalanNet is the final domain-specific release of the convolutional QuartzNet framework adapted for Bambara children's reading materials. It is a fine-tuned version of RobotsMali/stt-bm-quartznet15x5-v2. Like its predecessors, the model was fine-tuned using NVIDIA NeMo and trained with CTC (Connectionist Temporal Classification) Loss.

🚨 Obsolescence Notice

This architecture is officially retired. Field testing and benchmark evaluations demonstrate that this convolutional foundation exhibits unstable alignment paths under tight, low-resource constraints compared to hybrid attention-transducer systems.

NVIDIA NeMo: Installation

To load or run evaluations on this legacy checkpoint, install the standard NVIDIA NeMo package:

pip install nemo-toolkit['asr']

How to Use This Model

Load Model with NeMo

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="RobotsMali/anbekalanNet")

Transcribe Audio

# Downsamples or processes input natively via its internal preprocessor
asr_model.transcribe(['sample_audio.wav'])

Input / Output

  • Input: Accepts 16 kHz mono-channel audio (wav files).
  • Output: Generates a transcribed speech hypothesis object with a lowercase .text string attribute containing character-encoded text. It does not output punctuations or capitalizations.

Model Architecture

QuartzNet is a convolutional ASR model consisting of 1D time-channel separable convolutions designed to minimize parameter count while maintaining acoustic representations. This specific variant utilizes a 15x5 block structure with roughly 18 million parameters.

Training & Fine-Tuning Configurations

Four experimental setups were designed to test vocabulary limits and regularization effects. This final artifact (anbekalanNet) used the following strict parameters:

* Optimization Window: Regulated with an Early Stopping mechanism set to a 15-epoch patience window monitored against validation metrics.

* Convergence Behavior: Due to high training-batch lexical convergence (<4% WER), validation metrics flatlined early. Operational shutdown was forced at epoch 30 to protect the encoder from total generalization collapse.

Dataset

The model was fine-tuned on the combined Main + Duplicate expanded subsets (45.6 hours total) of the RobotsMali/an-be-kalan-bench educational children's book corpus.

* Main Split (1.6h): Pristine recordings of unique readings across 22 GAIFE books by 8 distinct speakers.

* Duplicate Split (44h): High-density, redundant multi-speaker tracks reading identical textual literature to introduce physical vocal variance (pitch, child vocal acoustics, and regional accents).

Performance

The performance metrics below illustrate how expanding data volume rescued the QuartzNet framework from catastrophic lexical overfitting.

Overall Evaluation Metrics

Experimental Pass Dataset Baseline Configuration SpecAugment Training Volume Test WER (%) ↓ Test CER (%) ↓
anbekalanNet-exp3 (this release)
Main + Duplicate

None

45.6 Hours

40.0%

15.0%
anbekalanNet-exp1
Main Only

None

1.6 Hours

93.0%

80.0%
anbekalanNet-exp2
Main Only

Active

1.6 Hours

64.0%

23.0%
anbekalanNet-exp4
Main + Duplicate

Active

45.6 Hours

42.0%

16.0%

All results indicate greedy decoding performance without external Language Models (LMs).

License

This legacy checkpoint is archived and released under the CC-BY-4.0 license.


Repository & Issues: Technical tracking for this legacy series can be referenced at RobotsMali-AI/bambara-asr. No further architectural expansions or fine-tuning updates are planned for this model card sequence.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RobotsMali/anbekalanNet

Finetuned
(1)
this model

Dataset used to train RobotsMali/anbekalanNet

Evaluation results