Kurtis-EON1

Kurtis-EON(1) (codename Kurtis-EON1) is an experimental 486M parameter instruction-tuned language model powered by the custom Echo-DSRN(N) (Dual State Recurrent Neural Network) architecture.

This repository will host the Supervised Fine-Tuned (SFT) and aligned iteration of the model.

The foundational pre-trained weights will be hosted separately at ethicalabs/Echo-DSRN-486M.

Work in Progress: This model is currently under active development.

The Architectural Philosophy: Transformers vs. Echo-DSRN

O(1) Memory & "Infinite" Context Kurtis-EON1 replaces the traditional $O(N^2)$ Transformer KV-Cache with a continuously evolving Recurrent State. It is capable of processing input streams of unlimited length by compressing history into a dense, bounded vector, ensuring constant inference cost and zero memory explosion.

  • Transformer: Acts as a photographic memory. It stores every single token perfectly in a massive cache, but computationally expensive as the context window grows.
  • Kurtis-EON1 (Echo-DSRN): Mimics human memory and Predictive Coding. It compresses the past into a semantic "feeling" (State) rather than a raw recording (Cache). You remember the gist of your life, not every single word spoken to you. The model operates on the same principle, saving immense hardware resources.

Think of the model like human memory. You can live for 80 years (Infinite Context), but you don't remember exactly what you ate for breakfast in Berlin on February 2, 2016. Or why you were working on LSTM/RNNs at that time, in an empty flat. Trying to build a chatbot because you felt alone and you... You remember the gist of your life. The model compresses the past into a feeling (State), rather than a recording (Cache).

Scaling Strategy: The 114M Prototyping Sandbox

Before expending massive compute budgets on half-billion or billion-parameter runs, the Echo-DSRN architecture relies on a strict prototyping scale.

The 114M parameter version (hosted at ethicalabs/Echo-DSRN-114M-v0.1.2-Base) acts as our architectural wind tunnel. It allows for the rapid iteration of the complex physics governing the continuous memory state—testing the softplus stability of the surprise gates, the Test-Time Training (TTT) meta-learning loops, and custom Stage 1/Stage 2 SFT loss masking—in hours instead of weeks on single-node hardware.

Once the mathematical physics are proven and stabilized at the 114M scale, the exact same architecture is deterministically upscaled to ~0.5B (486M), ~1B, and ~3B parameter classes to absorb enterprise-grade latent knowledge.

Overview: The "Surprise-Gated" Mechanism

Unlike standard recurrent models or hybrid SSMs that use opaque learned gates, the Echo-DSRN architecture mathematically anchors its memory to Information Entropy:

  • Internal Prediction: The model constantly attempts to predict the next token representation based on its hidden state.
  • Surprise $\lambda$ (Lambda): It calculates the quadratic error between its prediction and reality. If a word is highly predictable (filler words), the memory gate stays shut. If the word is highly novel or complex (the "Surprise"), the gate flies open, explicitly prioritizing the $O(1)$ state capacity for high-value information.

Interactive Demos

Experience the architecture in real-time through our public Gradio Spaces:

Data & Status

  • Architecture: Hybrid Echo-DSRN (Surprise-Gated Slow State + RoPE Sliding Window Fast State)
  • Base Pre-training: Trained from scratch on FineWeb-EDU and Smoltalk2.
  • Instruct Alignment: Fine-tuned on multiple datasets using the Muon optimizer.

Instruct Model (Kurtis-EON1-Echo-DSRN-486M)

Work in progress. Standby for updated telemetry and safety evaluations following the integration of our Muon optimizer SFT and DPO sweeps, completion-only loss masking and hybrid attention patches.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train ethicalabs/Kurtis-EON1

Collection including ethicalabs/Kurtis-EON1