ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ymoslem authored a paper 1 day ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

ymoslem submitted a paper 1 day ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

mapooon authored a paper 15 days ago

Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection

View all activity

mapooon

authored a paper 15 days ago

Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection

Paper • 2606.11894 • Published 21 days ago

hiyouga

posted an update 27 days ago

Post

640

Follow my X account — I'll be sharing thoughts and findings on building open-source AI Agent projects, Agent Memory, and Observability.

Thanks for connecting!

https://x.com/code_hiyouga

Lakonik

authored a paper about 2 months ago

Asymmetric Flow Models

Paper • 2605.12964 • Published May 13 • 22

Lakonik

submitted a paper to Daily Papers about 2 months ago

Asymmetric Flow Models

Paper • 2605.12964 • Published May 13 • 22

anakin87

posted an update 2 months ago

Post

3399

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

anakin87

posted an update 2 months ago

Post

107

Local Gemma 4 agent 💎🕵️🗺️
drop in a mysterious map, get the location, live weather, and top spots to visit

I've been exploring what google/gemma-4-E4B-it can do in a local agentic setup and put together a 📓 𝙣𝙤𝙩𝙚𝙗𝙤𝙤𝙠 with Gemma + Haystack AI Framework covering 4 demos.

📓 https://t.ly/04Ty5

Another interesting one is the 𝗚𝗶𝘁𝗛𝘂𝗯 𝗔𝗴𝗲𝗻𝘁.

I initially tried to load all tools from the GitHub MCP server, quickly filling the context available on Colab -> unusable, forgetful agent ❌

Then I used the 𝗦𝗲𝗮𝗿𝗰𝗵𝗮𝗯𝗹𝗲 𝗧𝗼𝗼𝗹𝘀𝗲𝘁 🔎 🧰
It dynamically discovers the right tools from the GitHub MCP server on the fly, loading only what it actually needs for the task at hand, keeping context lean.

Now it actually works.

The notebook also contains
💎 Multimodal weather agent: the mystery map demo above
💎 Visual Question Answering from a paper
💎 RAG on Rock music

anakin87

posted an update 2 months ago

Post

10412

How LLM training with RL Environments works?

It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training

In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)

Consider a more complex tic-tac-toe env ❌⭕
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions

(envs can also include tools)

---

What happens at training?

We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env

No critic model needed, the group is the baseline
Simpler than PPO

1️⃣ Rollout generation: from the same board, model plays N games via sampling
2️⃣ Each game scored with deterministic rewards (win, format, ...)
3️⃣ Mean score computed across the group
4️⃣ Each rollout's advantage = its score minus the group mean
5️⃣ Model updated to favor trajectories above baseline

🔁 Repeat

For a deep dive, check out
🌱 https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs

2 replies

mrfakename

in zero-gpu-explorers/README 2 months ago

Why doesn't anyone host llms in zerogpu spaces?

#172 opened 3 months ago by

Reality123b

nroggendorff

in zero-gpu-explorers/README 2 months ago

Why doesn't anyone host llms in zerogpu spaces?

#172 opened 3 months ago by

Reality123b

anakin87

posted an update 3 months ago

Post

1625

Your RL environment is an SFT data factory 🏭

In LLM post-training it's common to do Supervised Fine-Tuning warm-up before Reinforcement Learning.

When teaching a new task, RL needs some signal to amplify and SFT builds a good initial basis, for example by teaching format.

If you've built an RL env, generating SFT synthetic data is basically free.

An env already has: task data, rollout logic, rewards.

1️⃣ pick a strong model
2️⃣ run it through the env
3️⃣ filter rollouts by reward

works out of the box with Verifiers (Prime Intellect) and Atropos (Nous Research)

🧑‍💻 Example: https://github.com/anakin87/llm-rl-environments-lil-course/blob/main/chapters/05.md

anakin87

posted an update 3 months ago

Post

4193

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

anakin87

posted an update 3 months ago

Post

3314

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

manchery

authored a paper 3 months ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 204

Bils

posted an update 3 months ago

Post

2775

Avatars are everywhere, but here is the reality behind full-system marketing automation. 🚀
Many see "Madame AI" simply as an AI news presenter. She is far deeper than that. Madame AI is a Real-time Agentic AI Assistant we developed to orchestrate entire workflows for marketing and professional media. She manages UGC (User-Generated Content), understands marketing system automation intuitively, and handles complex media tasks.
We have solved the character consistency and high production cost bottlenecks that traditionally required immense training and time. By precisely orchestrating every computational step behind videos and branded designs, we have fully automated the pipeline and significantly reduced costs.
This capability is built on our extensive experience managing large-scale automation projects with complex requirement documentation (PRD).
Grabclip is our public portal and the practical result of that journey. It is the interface where "Madame AI" acts as the intelligent engine.
We have spent three years building this pipeline with a clear goal: a 100% local, end-to-end solution that operates despite external restrictions.
See the live example on YouTube (our fast-paced AI news podcast with Madame AI) and try the automation portal yourself👇
📺 The Playlist: https://www.youtube.com/playlist?list=PLwEbW4bdYBSCVSziFfJYq4zXop_cyHquO
🌐 Our Portal (Grabclip) — The first practical step in our pipeline: https://grabclip.bilsimaging.com/
hashtag#AgenticAI hashtag#VirtualInfluencer hashtag#FutureOfWork hashtag#GenerativeAI hashtag#TunisiaTech hashtag#MarketingAutomation hashtag#100PercentLocal hashtag#OSMedia hashtag#Grabclip hashtag#RealTimeAssistant hashtag#UGC hashtag#ProfessionalMedia hashtag#TunisiaAI