Jeremy Haschal

JermemyHaschal

AI & ML interests

None yet

Recent Activity

reacted to albertvillanova's post with 🤗 about 4 hours ago

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill. This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as: - Supervised Fine-Tuning (SFT) - Direct Preference Optimization (DPO) - Group Relative Policy Optimization (GRPO) We’re excited to see what the community builds on top of this. If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗 The future of ML tooling is agent-native. 🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

reacted to OzTianlu's post with 🤗 8 days ago

O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ ! https://huggingface.co/NoesisLab/Spartacus-1B-Instruct We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head. The technical core of this architecture relies on the associativity of the monoid operator: Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously. Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length. Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates. Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%. The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.

new activity 14 days ago

TheDrummer/Rocinante-X-12B-v1-GGUF:Comparison with Rivermind-Lux-12B-v1b?

View all activity

Organizations

None yet

New activity in TheDrummer/Rocinante-X-12B-v1-GGUF 14 days ago

Comparison with Rivermind-Lux-12B-v1b?

#1 opened 15 days ago by

JermemyHaschal

New activity in fancyfeast/joy-caption-beta-one 4 months ago

Joytag no longer works

#14 opened 4 months ago by

bro123123

New activity in lodestones/Chroma1-Radiance 5 months ago

ERROR: Could not detect model type of: D:\...\Chroma1-Radiance-v0.1.safetensors

#2 opened 6 months ago by

Viennar

New activity in openbmb/MiniCPM-V-4 7 months ago

HF Space?

#4 opened 7 months ago by

JermemyHaschal

New activity in bosonai/higgs-audio-v2-generation-3B-base 7 months ago

Using tags?

#5 opened 7 months ago by

JermemyHaschal

New activity in unsloth/dots.llm1.inst-GGUF 7 months ago

Please use `--jinja` or else gibberish!

🚀 ❤️ 2

#2 opened 8 months ago by

danielhanchen

New activity in concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf 8 months ago

Use with other model?

#3 opened 8 months ago by

JermemyHaschal

New activity in silveroxides/Chroma-GGUF 8 months ago

Difference between normal and 'detail-calibrated'?

👍 3

#13 opened 8 months ago by

JermemyHaschal

New activity in alamios/Mistral-Small-3.1-DRAFT-0.5B-GGUF 11 months ago

Mistral-Nemo-Instruct-2407 compatibility?

#1 opened 11 months ago by

JermemyHaschal

New activity in TheDrummer/Gemmasutra-9B-v1.1-GGUF about 1 year ago

Request?

#1 opened about 1 year ago by

BlueNipples

New activity in fishaudio/fish-speech-1.5 about 1 year ago

How to run this?

#4 opened about 1 year ago by

JermemyHaschal

New activity in MaziyarPanahi/calme-2.1-qwen2-72b-GGUF over 1 year ago

Perplexity loss?

#11 opened over 1 year ago by

JermemyHaschal

Jeremy Haschal

AI & ML interests

Recent Activity

Organizations

JermemyHaschal's activity

Comparison with Rivermind-Lux-12B-v1b?

Joytag no longer works

ERROR: Could not detect model type of: D:\...\Chroma1-Radiance-v0.1.safetensors

HF Space?

Using tags?

Please use `--jinja` or else gibberish!

Use with other model?

Difference between normal and 'detail-calibrated'?

Mistral-Nemo-Instruct-2407 compatibility?

Request?

How to run this?

Perplexity loss?