Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Recent Activity
View all activity
Organization Card
ššš
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠1.78k ⢠33 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠81.3M ⢠Updated ⢠856 ⢠10 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠2.3k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠58.1M ⢠Updated ⢠870 ⢠4
Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠1.78k ⢠33 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠81.3M ⢠Updated ⢠856 ⢠10 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠2.3k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠58.1M ⢠Updated ⢠870 ⢠4
models 58
BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text ⢠0.9B ⢠Updated ⢠5
BEE-spoke-data/neobert-100k-test
Fill-Mask ⢠0.1B ⢠Updated ⢠1
BEE-spoke-data/tiny-random-MPNetForMaskedLM
Fill-Mask ⢠237k ⢠Updated ⢠1
BEE-spoke-data/bpe-tokenizer-32k-smolNeoX
Updated
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig
Updated
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp
Updated
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization ⢠0.3B ⢠Updated ⢠45 ⢠2
BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text Generation ⢠0.7B ⢠Updated ⢠3
BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
0.7B ⢠Updated
BEE-spoke-data/tFINE-900m-instruct-orpo
0.9B ⢠Updated ⢠1
datasets 83
BEE-spoke-data/awesome-python-apps
Viewer ⢠Updated ⢠61.1k ⢠1
BEE-spoke-data/SurvivorLib-Nanonets-OCR-s
Viewer ⢠Updated ⢠14.4k ⢠17 ⢠2
BEE-spoke-data/SurvivorLib-rolmOCR
Viewer ⢠Updated ⢠14.6k ⢠51 ⢠1
BEE-spoke-data/govdocs1-pdf-source
Viewer ⢠Updated ⢠235k ⢠779 ⢠4
BEE-spoke-data/napierone-pdf-nanonets-s
Viewer ⢠Updated ⢠9.96k ⢠10
BEE-spoke-data/napierone-pdf-olmOCR
Viewer ⢠Updated ⢠19k ⢠16
BEE-spoke-data/LONGCOT-merged-1M
Viewer ⢠Updated ⢠1.7M ⢠39 ⢠2
BEE-spoke-data/cosmopedia-v2-mincols
Viewer ⢠Updated ⢠39.1M ⢠21 ⢠1
BEE-spoke-data/reddit-title-body-hf
Viewer ⢠Updated ⢠251M ⢠137 ⢠4
BEE-spoke-data/bigpatent-all
Viewer ⢠Updated ⢠2.43M ⢠221