So you basically still want ASR-style transcription before the LLM kicks in (perhaps to reduce hallucination? or another purpose?), but would like the representation to be more rich so a downstream LLM can still reason about pronunciation, pauses and so on?
Omar Kamali PRO
omarkamali
AI & ML interests
NLP & LLMs for low resource languages.
Recent Activity
repliedto their post about 5 hours ago
I just might have cracked tokenizer-free LLMs. No vocab, no softmax.
I'm training a 22M params LLM rn to test this "thing" and it's able to formulate coherent sentences π€―
Bear in mind, this is a completely new, tokenizer-free LLM architecture with built-in language universality.
Check the explainer video to understand what's happening. Feedback welcome on this approach!
repliedto their post about 6 hours ago
I just might have cracked tokenizer-free LLMs. No vocab, no softmax.
I'm training a 22M params LLM rn to test this "thing" and it's able to formulate coherent sentences π€―
Bear in mind, this is a completely new, tokenizer-free LLM architecture with built-in language universality.
Check the explainer video to understand what's happening. Feedback welcome on this approach!
repliedto their post about 8 hours ago
I just might have cracked tokenizer-free LLMs. No vocab, no softmax.
I'm training a 22M params LLM rn to test this "thing" and it's able to formulate coherent sentences π€―
Bear in mind, this is a completely new, tokenizer-free LLM architecture with built-in language universality.
Check the explainer video to understand what's happening. Feedback welcome on this approach!


