Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Citaman 's Collections
omni models
Keep in Mind's Paper
LLM From Scratch - Datasets
Keep in Mind's Model
Keep in Mind's Vision models
Keep in Mind's TTS Model
Keep in Mind's Embbeding model
Keep in mind's - Text to Image Generation
Space - keep in minf
Dataset Image

LLM From Scratch - Datasets

updated Mar 14
Upvote
-

  • Skylion007/openwebtext

    Viewer • Updated Dec 26, 2025 • 8.01M • 72.7k • 503

  • JeanKaddour/minipile

    Viewer • Updated Jun 20, 2023 • 1.01M • 23.2k • 143

  • Locutusque/TM-DATA

    Viewer • Updated Oct 15, 2024 • 2.77M • 102 • 11

  • PleIAs/French-PD-Newspapers

    Viewer • Updated Mar 19, 2024 • 2.25M • 609 • 68

  • euclaise/MiniCoT

    Viewer • Updated Jan 23, 2024 • 129k • 16 • 7

  • euirim/goodwiki

    Viewer • Updated Sep 11, 2023 • 44.8k • 169 • 54

  • euclaise/mathoverflow-accepted

    Viewer • Updated Oct 20, 2023 • 62.6k • 99 • 4

  • Locutusque/UltraTextbooks

    Viewer • Updated Feb 2, 2024 • 5.52M • 301 • 198

  • TempoFunk/webvid-10M

    Viewer • Updated Aug 19, 2023 • 10.7M • 8.4k • 90

  • HuggingFaceTB/cosmopedia

    Viewer • Updated Aug 12, 2024 • 31.1M • 14.1k • 683

  • HuggingFaceGECLM/REDDIT_submissions

    Viewer • Updated Mar 17, 2023 • 47.2M • 1.16k • 11

  • togethercomputer/RedPajama-Data-V2

    Updated Nov 21, 2024 • 3.11k • 401

  • stepfun-ai/Step-3.5-Flash-SFT

    Viewer • Updated Mar 14 • 1.62M • 38.1k • 320
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs