Building on HF

1 3

Hariprasad Sundaresan PRO

Hari5115

AI & ML interests

LLMs, Fine-tuning, Agentic AI, RAG, Multilingual NLP, Transformers

Recent Activity

updated a Space 2 days ago

Hari5115/neon-pop

reacted to dippatel1994's post with 🔥 2 days ago

To make revising LLM architectures and training methods faster, I created a deck of 180 visual flashcards. It started as a personal hobby, but slowly became cheat code for reviewing LLM concepts before technical interviews. People love it! Swipe through these samples, and if you want to grab the full set or follow the project, the repo is here: https://github.com/llmsresearch/llm-flashcards.

posted an update 2 days ago

Can you predict what something smells like just from its chemical structure? Turns out yes — and a model can learn it. Smell is molecular. Specific shapes bind to specific receptors in your nose. That pattern is encodable. Feed it a molecule, get odor descriptors back: Ethanol → alcoholic (87%) + ethereal (62%) Isoamyl alcohol → floral (71%) + fruity (58%) — this is literally what makes bananas smell like bananas 🌸 https://huggingface.co/Hari5115/molecular-odor-predictor 📦 https://huggingface.co/datasets/Hari5115/molecular-odor-dataset 🚀 https://huggingface.co/spaces/Hari5115/molecular-odor-demo Working in fragrance, food science, or sensory AI? Would love to connect 🙏 #MolecularAI #FoodScience #ChemInformatics #OpenSource #HuggingFace

View all activity

Organizations

None yet

reacted to dippatel1994's post with 🔥 2 days ago

Post

1006

To make revising LLM architectures and training methods faster, I created a deck of 180 visual flashcards. It started as a personal hobby, but slowly became cheat code for reviewing LLM concepts before technical interviews. People love it!

Swipe through these samples, and if you want to grab the full set or follow the project, the repo is here: https://github.com/llmsresearch/llm-flashcards.

posted an update 2 days ago

Post

Can you predict what something smells like just from its chemical structure?

Turns out yes — and a model can learn it.

Smell is molecular. Specific shapes bind to specific receptors in your nose. That pattern is encodable.

Feed it a molecule, get odor descriptors back:
Ethanol → alcoholic (87%) + ethereal (62%)
Isoamyl alcohol → floral (71%) + fruity (58%) — this is literally what makes bananas smell like bananas

🌸 Hari5115/molecular-odor-predictor
📦 Hari5115/molecular-odor-dataset
🚀 Hari5115/molecular-odor-demo

Working in fragrance, food science, or sensory AI? Would love to connect 🙏

#MolecularAI #FoodScience #ChemInformatics #OpenSource #HuggingFace

reacted to mmhamdy's post with 🚀 2 days ago

Post

972

Human brains don't recreate every pixel to understand the world!

Most current models in genomics, proteomics, and single-cell transcriptomics rely on generative objectives like masked language modeling or next token prediction. While effective, these architectures waste significant capacity reconstructing raw, noisy sequence details that may not carry functional biological meaning.

But a promising, more efficient alternative is emerging: Joint-Embedding Predictive Architecture (JEPA)

Originally introduced by Yann LeCun for computer vision, JEPA is a non-generative, self-supervised learning (SSL) framework. Instead of predicting raw inputs, it operates as a world model that predicts abstract semantic embeddings in latent space.

Recently, the JEPA framework (and its more efficient LeJEPA variant) has been adapted into the biological sciences to develop performing foundation models and to improve on already existing ones.

It's interesting how each adaptation modified and tailored JEPA to suit its specific biological domain, whether by experimenting with different backbones or complementing the objective with other loss terms.

For example, JEPA-DNA and ProteinJEPA used JEPA as a continual pre-training framework to enhance existing foundation models without training from scratch, while Cell-JEPA and JEPA-DNA employed a hybrid objective that combines the JEPA loss with a traditional language modeling loss.

The article below provides an overview of these implementations, along with others that came out this year. As always, your thoughts and feedback are welcome and highly appreciated!

Link to the article is in the first comment 👇

3 replies

posted an update 6 days ago

Post

How do scientists know if a new chemical is toxic before testing it on anything alive ?

For decades: animal studies. Slow, expensive, ethically complicated.

Tox21 changed that — a government-backed initiative screening 10,000 compounds across 12 biochemical assays, testing whether molecules activate receptors linked to cancer, hormonal disruption, and organ damage.

I trained a model on this data and published everything under MIT — free for research, education, and building.

🔬 Hari5115/molecular-toxicity-predictor
📦 Hari5115/MoleculeIQ
🚀 Hari5115/molecular-toxicity-demo

⚠️ Research and educational use only. Not a substitute for certified toxicological testing. Model has known limitations on novel chemical classes — see model card for details.

What would you build with this? 👇

#OpenSource #DrugDiscovery #ChemInformatics #Tox21 #HuggingFace

reacted to pankajpandey-dev's post with 👍 16 days ago

Post

2692

🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA

reacted to fffiloni's post with 🔥 16 days ago

Post

3544

I built HF Radio on Hugging Face Spaces 📻
fffiloni/HF-Radio

A live community radio for AI-generated songs, powered by tracks created with ACE-Step.

You can tune in, discover community-made songs in many languages, vote on what sounds good, and mark your real favorites as Bangers.

The more people listen, vote, and create, the better the station gets.

Under the hood, it connects a few Hugging Face pieces together:

Spaces for the live app, HF buckets for community tracks, OAuth for signed-in listeners, server-side streaming with ffmpeg, hourly playlist refreshes, moderation, jingles, and community feedback loops.

It’s not just a playlist.

It’s a shared taste experiment:
new songs get a shot every hour, and the community helps decide what deserves another spin.

Come listen.
Find weird gems.
Support the Bangers.
Shape the radio.

—> fffiloni/HF-Radio

posted an update 16 days ago

Post

148

Spanglish. Hinglish. Franglais. Real users don't speak textbook languages 🌍

500M+ Indians type messages like the below example, Most NLP pipelines fail silently on code-switched text, so I built something to start closing that gap for regional language

Fine-tuned MuRIL on 3,000 synthetic Hinglish examples. 97.6% F1.
Not perfect, but open, working, and hopefully useful.

"bhai mera refund kab aayega" → refund_status ✓
"wrong item aaya hai" → exchange_product ✓
"payment cut ho gaya order nahi hua" → payment_issue ✓

🤖 Hari5115/hinglish-retail-intent-classifier
📦 Hari5115/hinglish-retail-intent-dataset
🚀 Hari5115/hinglish-retail-intent-demo

Building for multilingual markets? Have real code-switched data? Would love to connect 🙏

#Hindi #IndianAI #RegionalNLP #NLP #MultilingualAI #OpenSource

Hariprasad Sundaresan PRO

AI & ML interests

Recent Activity

Organizations

Hari5115's activity