Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling
Abstract
Timer-S1 is a scalable Mixture-of-Experts time series model with 8.3B parameters that uses serial scaling and novel TimeMoE blocks to improve long-term forecasting accuracy.
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.
Community
Timer-S1 introduces a billion-parameter MoE time-series foundation model with serial scaling, long-context capabilities, TimeMoE/TimeSTP blocks, TimeBench data, and post-training for enhanced forecasting.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts (2026)
- Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers (2026)
- EIDOS: Latent-Space Predictive Learning for Time Series Foundation Models (2026)
- Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models (2026)
- Deep TPC: Temporal-Prior Conditioning for Time Series Forecasting (2026)
- Enhancing few-shot time series forecasting with LLM-guided diffusion (2026)
- T-LLM: Teaching Large Language Models to Forecast Time Series via Temporal Distillation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper