Massimo Roberto Scamarcia PRO

mrs83

AI & ML interests

Natural Language Processing, Text Generation, Question Answering, Data Augmentation, Knowledge Transfer, Chain-of-Thought, ResearchOps, MLOps

Recent Activity

posted an update about 2 hours ago

In 2017, my RNNs were babbling. Today, they are hallucinating beautifully. 10 years ago, getting an LSTM to output coherent English was a struggle. 10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating. We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge. The ambitious goal is to build a small instruct model with RAG and tool usage capabilities (https://huggingface.co/ethicalabs/Kurtis-EON1) 📊 The Benchmarks (Size: 400M) For a model this size (trained on <10B tokens), the specialized performance is surprising: *SciQ*: 73.8% 🦄 (This rivals billion-parameter models in pure fact retrieval). *PIQA*: 62.3% (Solid physical intuition for a sub-1B model). The Reality Check: HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training. We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore..."). The Instruct Model is not ready yet and are currently using curriculum learning to test model plasticity. Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries. 🤝 Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, let’s connect! Training diary: https://huggingface.co/ethicalabs/Kurtis-EON1

liked a dataset about 21 hours ago

allenai/tulu-3-sft-mixture

upvoted a collection about 24 hours ago

Teacher Logits

View all activity

Organizations

upvoted a collection about 24 hours ago

Teacher Logits

Collection

Logits captured from large models to act as the teacher for distillation • 3 items • Updated Dec 15, 2025 • 11

upvoted a collection about 1 month ago

Granite 4.0 Language Models

Collection

13 items • Updated Nov 17, 2025 • 206

upvoted an article about 2 months ago

Article

FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages

Jul 8, 2025

•

upvoted an article 3 months ago

Article

Voice Cloning with Consent

Oct 28, 2025

•

upvoted an article 4 months ago

Article

Granite 4.0 Nano: Just how small can you go?

Oct 28, 2025

•

123

upvoted 2 articles 5 months ago

Article

Make your ZeroGPU Spaces go brrr with ahead-of-time compilation

Sep 2, 2025

•

Article

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Nov 3, 2022

•

356

upvoted an article 6 months ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

Aug 5, 2025

•

510

upvoted a collection 12 months ago

Tower

Collection

Model weights and SFT data for Tower. • 11 items • Updated Nov 15, 2024 • 32

Massimo Roberto Scamarcia PRO

AI & ML interests

Recent Activity

Organizations

mrs83's activity

FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages

Voice Cloning with Consent

Granite 4.0 Nano: Just how small can you go?

Make your ZeroGPU Spaces go brrr with ahead-of-time compilation

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Welcome GPT OSS, the new open-source model family from OpenAI!