FlameF0X (Daniel Fox)

replied to their post about 5 hours ago

Not yet. I'm still experimenting.
Once I get something that I'm pleased with I'm going to write a blog.

posted an update about 5 hours ago

Post

21

Greetings Hugging Face!

I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.

So far I have:

- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.

The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.

2 replies

·

reacted to appvoid's post with 👀 about 5 hours ago

Post

29

As promised

appvoid/palmer-005-nano

1 reply

·

reacted to ArtelTaleb's post with 🔥 7 days ago

Post

2497

✈️ World Flight Arcade - Can you land in 60 seconds?

I just dropped a new browser game built entirely with Three.js: World Flight Arcade

The concept is brutally simple:
- 🕐 60 seconds of flight above a neon wireframe city
- ✈️ One single attempt to land on the runway
- 💀 No second chances. No respawn. Just you, the controls, and the clock.

The camera system is fully dynamic - it stays locked behind the plane within a ±45° pitch/yaw envelope, giving you that cinematic flight feel while keeping full spatial awareness.

Can you nail the landing on your first try?

👉 Play here: ArtelTaleb/world-flight-arcade

Built by Artel3D - handcrafted in Three.js, zero dependencies, runs directly in your browser.

Drop your score in the comments 👇

#gamedev #threejs #browserGame #webgl #artel3d #indiegame

reacted to HannesVonEssen's post with ❤️ 10 days ago

Post

187

📣 I made a visualizer for Hugging Face models: https://hfviewer.com

✨ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!

🔗 The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B

Feel free to try it out and give me feedback on how it can be improved! ❤️

1 reply

·

reacted to Crownelius's post with 🔥 15 days ago

Post

3806

[DAY ONE] PROJECT CROWFEATHER 4/30/2026
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.

Crowfeather/Crowfeather-50m

54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.

Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.

What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.

What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.

The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k

Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](

Crowfeather ) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away.

Graphs will be available on my NEXT model lol

-Shane

3 replies

·

reacted to anakin87's post with ❤️ 21 days ago

Post

3301

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to SeaWolf-AI's post with 🔥 2 months ago

Post

5067

ALL Bench — Global AI Model Unified Leaderboard

FINAL-Bench/all-bench-leaderboard

If you've ever tried to compare GPT-5.2 and Claude Opus 4.6 side by side, you've probably hit the same wall: the official Hugging Face leaderboard only tracks open-source models, so the most widely used AI systems simply aren't there. ALL Bench fixes that by bringing closed-source models, open-weight models, and — uniquely — all four teams under South Korea's national sovereign AI program into a single leaderboard. Thirty-one frontier models, one consistent scoring scale.
Scoring works differently here too. Most leaderboards skip benchmarks a model hasn't submitted, which lets models game their ranking by withholding results. ALL Bench treats every missing entry as zero and divides by ten, so there's no advantage in hiding your weak spots.
The ten core benchmarks span reasoning (GPQA Diamond, AIME 2025, HLE, ARC-AGI-2), coding (SWE-bench Verified, LiveCodeBench), and instruction-following (IFEval, BFCL). The standout is FINAL Bench — the world's only benchmark measuring whether a model can catch and correct its own mistakes. It reached rank five in global dataset popularity on Hugging Face in February 2026 and has been covered by Seoul Shinmun, Asia Economy, IT Chosun, and Behind.
Nine interactive charts let you explore everything from composite score rankings and a full heatmap to an open-vs-closed scatter plot. Operational metrics like context window, output speed, and pricing are included alongside benchmark scores.
All data is sourced from Artificial Analysis Intelligence Index v4.0, arXiv technical reports, Chatbot Arena ELO ratings, and the Korean Ministry of Science and ICT's official evaluation results. Updates monthly.

reacted to marksverdhei's post with 🤗 3 months ago

Post

2691

Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.

React to this post if you want to see this feature! 💡

reacted to IlyasMoutawwakil's post with 🔥 4 months ago

Post

2465

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥

Why it had to be done 👇
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
⚙️ Compile end-to-end with torch.compile backends
📦 Export reliably via torch.export and torch.onnx.export
🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !

1 reply

·

reacted to danielhanchen's post with 🔥 4 months ago

Post

2909

You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context

reacted to sergiopaniego's post with 🔥 4 months ago

Post

2324

New GRPO + TRL free Colab notebook out! 🔥

Fine-tune 7B+ models on T4 GPUs thanks to a ton of memory optimizations for GRPO

7B model uses only 9.2 GB VRAM (~7× reduction) 🤯

Try the notebook here 👉 https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_trl_lora_qlora.ipynb

reacted to mlabonne's post with 🚀 4 months ago

Post

10353

New family of 1B models just dropped!

> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss

Super proud of this release 🤗

3 replies

·

reacted to davidquicast's post with 🔥 5 months ago

Post

4284

Check out your 2025 Hugging Face Wrapped, a small experimental recap
hf-wrapped/2025

3 replies

·

replied to davidquicast's post 5 months ago

Yo, this is neat

reacted to davidquicast's post with 🤗 5 months ago

Post

4284

Check out your 2025 Hugging Face Wrapped, a small experimental recap
hf-wrapped/2025

3 replies

·

replied to their post 5 months ago

Hi bro gosh i couldn't find any way to contact youu XD. I want to ask you few things about the i3 models you created. Email me as soon as possible moviesrecommender.app@gmail.com

Hi there,

I don't feel comfortable speaking with people privately since I'm only 17 (soon 18).

But if you have questions about the models, I have two repos on GitHub for them:

i3-papers - paper available for i3-200m for now
open-i3 - the code for training i3-80m for now

All the technical details and code are available there.

reacted to mitkox's post with 🔥 7 months ago

Post

2861

Say hello to my little friends! I just unboxed this trio of HP Z2 G1a!

Three is always better than one!
3x AMD Ryzen AI Max+ Pro 395
384GB RAM
24TB of RAID storage
Ubuntu 24.04
ROCm 7.0.2
llama cpp, vLLM and Aibrix

Small, cheap GPUs are about to become the Raspberry Pi of edge AI inference. Sprinkle some kubectl fairy dust on top, and suddenly it's a high-availability, self-healing, cloud-native, enterprise-grade AI cluster camping in a closet.

Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.

3 replies

·

reacted to andito's post with ❤️ 7 months ago

Post

2641

Finally, our new paper is out! "𝗙𝗶𝗻𝗲𝗩𝗶𝘀𝗶𝗼𝗻: 𝗢𝗽𝗲𝗻 𝗗𝗮𝘁𝗮 𝗜𝘀 𝗔𝗹𝗹 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱"! 🥳
FineVision: Open Data Is All You Need (2510.17269)

If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.
We wanted to change that.

FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.

In the paper, we share how we built it:
🔍 finding and cleaning data at scale
🧹 removing excessive duplicates across sources
🤗 decontaminating against 66 public benchmarks

My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets.
NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!

🎉 To celebrate the paper, I’m also releasing a concatenated and shuffled version of the full dataset! 👉HuggingFaceM4/FineVision_full_shuffled

It’s ready to stream, so you can start training your own models right away:

from datasets import load_dataset
d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True)
print(next(iter(d)))

A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!

reacted to SelmaNajih001's post with 👍 7 months ago

Post

2305

Finally, I uploaded the model I developed for my master’s thesis! Given a financial event, it provides explained predictions based on a dataset of past news and central bank speeches.
Try it out here:
SelmaNajih001/StockPredictionExplanation
(Just restart the space and wait a minute)

The dataset used for RAG can be found here:
SelmaNajih001/FinancialNewsAndCentralBanksSpeeches-Summary-Rag
While the dataset used for the training is:
SelmaNajih001/FinancialClassification

I also wrote an article to explain how I've done the training. You can find it here https://huggingface.co/blog/SelmaNajih001/explainable-financial-predictions

2 replies

·

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

FlameF0X's activity