Umar Butler's picture

Umar Butler

umarbutler

·

https://umarbutler.com/

AI & ML interests

Law, technology, AI and everything in between.

Recent Activity

upvoted an article 1 day ago

Introducing Legal RAG Bench

published an article 1 day ago

Introducing Legal RAG Bench

posted an update 1 day ago

@abdurrahmanbutler and I just dropped Legal RAG Bench, the first benchmark for legal RAG systems to simultaneously evaluate hallucinations, retrieval failures, and reasoning errors. Our key takeaways are: 1. Embedding models, not generative models, are the primary driver of RAG accuracy. Switching from a general-purpose embedder like OpenAI's Text Embedding 3 Large to a legal domain embedder like Isaacus' Kanon 2 Embedder can raise accuracy by ~19 points. 2. Hallucinations are often triggered by retrieval failures. Fix your retrieval stack, and, in most cases, you end up fixing hallucinations. 3. Once you have a solid legal retrieval engine like Kanon 2 Embedder, it doesn’t matter as much what generative model you use; GPT-5.2 and Gemini 3.1 Pro perform relatively similarly, with Gemini 3.1 Pro achieving slightly better accuracy at the cost of more hallucinations. 4. Google's latest LLM, Gemini 3.1 Pro, is actually a bit worse than its predecessor at legal RAG, achieving 79.3% accuracy instead of 80.3%. These findings confirm what we already knew at Isaacus: that information retrieval sets the ceiling on the accuracy of legal RAG systems. It doesn’t matter how smart you are; you aren’t going to magically know what the penalty is for speeding in California without access to an up-to-date copy of the California Vehicle Code. Even still, to our knowledge, we’re the first to actually show this empirically. Unfortunately, as we highlight in our write-up, high-quality open legal benchmarks like Legal RAG Bench and our earlier MLEB are few and far between. In the interests of transparency, we have not only detailed exactly how we built Legal RAG Bench, but we’ve also released all of our data openly on Hugging Face. You can read our write up [here](https://isaacus.com/blog/legal-rag-bench), noting that we’ll soon be publishing it as a paper. Kudos to my brother @abdurrahmanbutler for serving as the lead author on this monumental release.

View all activity

Organizations

New activity in answerdotai/ModernBERT-base 3 months ago

Is this model meant for full bfloat16, AMP bfloat16 or no bfloat16?

#7 opened about 1 year ago by

commented 3 papers 4 months ago

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 18 •

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 18 •

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 18 •

New activity in Prarabdha/indian-legal-supervised-fine-tuning-data 6 months ago

License

#2 opened 6 months ago by

New activity in nguha/legalbench 6 months ago

LegalBench no longer loads on the latest version of datasets

#33 opened 6 months ago by

New activity in allenai/gooaq 7 months ago

Many answers are stored as literal string representations of arrays

#4 opened 7 months ago by

New activity in jhu-clsp/CLERC 8 months ago

License?

#7 opened 8 months ago by

New activity in answerdotai/ModernBERT-large-training-checkpoints 8 months ago

Last final stable checkpoint

#1 opened 8 months ago by

New activity in pietrolesci/nli_fever over 1 year ago

Premise and hypothesis wrong way around?

#2 opened almost 2 years ago by

New activity in nguha/legalbench over 1 year ago

Significant train/test imbalance makes this more tailored to GenAI rather than LLMs in general

#31 opened over 1 year ago by

New activity in Xenova/gpt-4 over 1 year ago

Conversion to tiktoken

#4 opened over 1 year ago by

New activity in isaacus/open-australian-legal-embeddings over 1 year ago

Dataset Viewer issue

#3 opened over 2 years ago by

New activity in isaacus/open-australian-legal-qa over 1 year ago

Fix typo in the dataset name

#20 opened over 1 year ago by

New activity in umarbutler/better-cuad over 1 year ago

[bot] Conversion to Parquet

#1 opened over 1 year ago by

parquet-converter

New activity in isaacus/open-australian-legal-corpus almost 2 years ago

Releasing v5.0.0.

#4 opened almost 2 years ago by

BuilderConfig 'train' not found

#3 opened almost 2 years ago by

New activity in yunconglong/Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B about 2 years ago

any contamination results?

#4 opened about 2 years ago by

New activity in isaacus/open-australian-legal-corpus about 2 years ago

Victoria?

#2 opened over 2 years ago by

New activity in TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ about 2 years ago

always getting 0 in output

#3 opened about 2 years ago by