Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 7 days ago
Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 7 days ago
iamshnoo/combined_no_europe_without_metadata_1b_step8k Text Generation • 1B • Updated 16 days ago • 916
iamshnoo/combined_no_europe_without_metadata_1b_step4k Text Generation • 1B • Updated 16 days ago • 909
iamshnoo/combined_no_europe_without_metadata_1b_step2k Text Generation • 1B • Updated 16 days ago • 892
iamshnoo/combined_no_asia_without_metadata_1b_step8k Text Generation • 1B • Updated 16 days ago • 867
iamshnoo/combined_no_asia_without_metadata_1b_step4k Text Generation • 1B • Updated 16 days ago • 864
iamshnoo/combined_no_asia_without_metadata_1b_step2k Text Generation • 1B • Updated 16 days ago • 844
iamshnoo/combined_no_america_without_metadata_1b_step8k Text Generation • 1B • Updated 16 days ago • 823
iamshnoo/combined_no_america_without_metadata_1b_step4k Text Generation • 1B • Updated 16 days ago • 823
iamshnoo/combined_no_america_without_metadata_1b_step2k Text Generation • 1B • Updated 16 days ago • 816
iamshnoo/combined_no_africa_without_metadata_1b_step8k Text Generation • 1B • Updated 16 days ago • 810