anbekalanNet (QuartzNet 15x5 char CTC Series) — [LEGACY]

| |

anbekalanNet is the final domain-specific release of the convolutional QuartzNet framework adapted for Bambara children's reading materials. It is a fine-tuned version of RobotsMali/stt-bm-quartznet15x5-v2. Like its predecessors, the model was fine-tuned using NVIDIA NeMo and trained with CTC (Connectionist Temporal Classification) Loss.

🚨 Obsolescence Notice

This architecture is officially retired. Field testing and benchmark evaluations demonstrate that this convolutional foundation exhibits unstable alignment paths under tight, low-resource constraints compared to hybrid attention-transducer systems.

NVIDIA NeMo: Installation

To load or run evaluations on this legacy checkpoint, install the standard NVIDIA NeMo package:

pip install nemo-toolkit['asr']

How to Use This Model

Load Model with NeMo

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="RobotsMali/anbekalanNet")

Transcribe Audio

# Downsamples or processes input natively via its internal preprocessor
asr_model.transcribe(['sample_audio.wav'])

Input / Output

Input: Accepts 16 kHz mono-channel audio (wav files).
Output: Generates a transcribed speech hypothesis object with a lowercase .text string attribute containing character-encoded text. It does not output punctuations or capitalizations.

Model Architecture

QuartzNet is a convolutional ASR model consisting of 1D time-channel separable convolutions designed to minimize parameter count while maintaining acoustic representations. This specific variant utilizes a 15x5 block structure with roughly 18 million parameters.

Training & Fine-Tuning Configurations

Four experimental setups were designed to test vocabulary limits and regularization effects. This final artifact (anbekalanNet) used the following strict parameters:

* Optimization Window: Regulated with an Early Stopping mechanism set to a 15-epoch patience window monitored against validation metrics.

* Convergence Behavior: Due to high training-batch lexical convergence (<4% WER), validation metrics flatlined early. Operational shutdown was forced at epoch 30 to protect the encoder from total generalization collapse.

Dataset

The model was fine-tuned on the combined Main + Duplicate expanded subsets (45.6 hours total) of the RobotsMali/an-be-kalan-bench educational children's book corpus.

* Main Split (1.6h): Pristine recordings of unique readings across 22 GAIFE books by 8 distinct speakers.

* Duplicate Split (44h): High-density, redundant multi-speaker tracks reading identical textual literature to introduce physical vocal variance (pitch, child vocal acoustics, and regional accents).

Performance

The performance metrics below illustrate how expanding data volume rescued the QuartzNet framework from catastrophic lexical overfitting.

Overall Evaluation Metrics

Experimental Pass	Dataset Baseline Configuration	SpecAugment	Training Volume	Test WER (%) ↓	Test CER (%) ↓
anbekalanNet-exp3 (this release)	Main + Duplicate	None	45.6 Hours	40.0%	15.0%
anbekalanNet-exp1	Main Only	None	1.6 Hours	93.0%	80.0%
anbekalanNet-exp2	Main Only	Active	1.6 Hours	64.0%	23.0%
anbekalanNet-exp4	Main + Duplicate	Active	45.6 Hours	42.0%	16.0%

All results indicate greedy decoding performance without external Language Models (LMs).

License

This legacy checkpoint is archived and released under the CC-BY-4.0 license.

Repository & Issues: Technical tracking for this legacy series can be referenced at RobotsMali-AI/bambara-asr. No further architectural expansions or fine-tuning updates are planned for this model card sequence.

Downloads last month: 2

Model tree for RobotsMali/anbekalanNet

Base model

RobotsMali/stt-bm-quartznet15x5-v0

Finetuned

RobotsMali/stt-bm-quartznet15x5-v2

Finetuned

(1)

this model

Dataset used to train RobotsMali/anbekalanNet

Evaluation results

Test WER on An be kalan Children's Reading Benchmark
test set self-reported

40.000
Test CER on An be kalan Children's Reading Benchmark
test set self-reported

15.000