A new model is coming! Its going to take a long time on my 5070 Ti so expect a release in ~1 month. We think this model is going to be SOTA For its size. Our Mini Version will be 25M Parameters and Pro with 140M. The Pro version has a 3072 Context Window (Extensible to up to 6K with RoPE) And the Mini version has a context window of 4096 (Up to 8K with RoPE) Meanwhile we are currently working on a Instruct Version of our BananaMind 1.5 Base.
๐ง Does your LLM know when it's about to be wrong?
Most leaderboards measure accuracy. We measure metacognition โ whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. ๐
The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 โ ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.
Two independent axes (never compared across a row): โ trap_rate โ does it fall for tempting trap options? (lower = stronger) โก adapter gain ฮ โ how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)
What's open: ๐ 300+100 trap problems (each with a hidden trap + TICOS type) ๐ 24-model leaderboard ๐งฉ 11 per-model adapters โ adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state โ P(wrong))
Submit any HF model โ auto-scored daily at 09:00 KST and added to the board.