andito (Andres Marafioti)

published an article 3 months ago

Article

I Let a Lobster Run My Jetson: What OpenClaw Taught Me About the Future of Computing

andito

•

Feb 19

• 16

published an article 7 months ago

Article

Streaming datasets: 100x More Efficient

+3

andito, lhoestq, burtenshaw, pcuenq, merve

•

Oct 27, 2025

• 86

published an article 7 months ago

Article

Supercharge your OCR Pipelines with Open Models

+5

merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq

•

Oct 21, 2025

• 312

published an article 10 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

+2

orrzohar, ruili0, andito, nicholswang

•

Jul 23, 2025

• 48

published an article 10 months ago

Article

Efficient MultiModal Data Pipeline

+3

ariG23498, lusxvr, andito, sergiopaniego, pcuenq

•

Jul 8, 2025

• 70

published an article 11 months ago

Article

KV Cache from scratch in nanoVLM

+3

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 119

published an article 11 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

+7

danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene

•

Jun 3, 2025

• 346

published an article 12 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 258

published an article about 1 year ago

Article

Vision Language Models (Better, faster, stronger)

+3

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 611

published an article about 1 year ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

+5

orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova

•

Feb 20, 2025

• 337

published an article about 1 year ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

+5

orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova

•

Feb 20, 2025

• 337

published an article over 1 year ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

+1

andito, mfarre, merve

•

Jan 23, 2025

• 192

published an article over 1 year ago

Article

SmolVLM - small yet mighty Vision Language Model

+3

andito, merve, mfarre, eliebak, pcuenq

•

Nov 26, 2024

• 417

published an article over 1 year ago

Article

Deploying Speech-to-Speech on Hugging Face

+2

andito, derek-thomas, dmaniloff, eustlb

•

Oct 22, 2024

• 45

published an article over 1 year ago

Article

FineVideo: behind the scenes

+4

mfarre, andito, lewtun, lvwerra, pcuenq, thomwolf

•

Sep 23, 2024

• 35

published an article almost 2 years ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

danaaubakirova, andito

•

Jul 25, 2024

• 17

published an article almost 2 years ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

danaaubakirova, andito

•

Jul 25, 2024

• 17

published an article almost 2 years ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

andito, HugoLaurencon

•

Jul 18, 2024

• 78

published an article almost 2 years ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

+1

andito, merve, SkalskiP

•

Jun 24, 2024

• 207

Andres Marafioti

AI & ML interests

Organizations

I Let a Lobster Run My Jetson: What OpenClaw Taught Me About the Future of Computing

Streaming datasets: 100x More Efficient

Supercharge your OCR Pipelines with Open Models

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Andres Marafioti

AI & ML interests

Organizations

andito's activity

I Let a Lobster Run My Jetson: What OpenClaw Taught Me About the Future of Computing

Streaming datasets: 100x More Efficient

Supercharge your OCR Pipelines with Open Models

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models