Hub documentation
Hub Local Cache
Hub Local Cache
This document describes the on-disk layout of the HF Hub local cache. It is intended as a reference for reimplementing the cache system in any language.
Here is a partial list of applications and libraries that use this cache layout. Please open a PR to add your own.
| Library or Application | Language | Notes |
|---|---|---|
huggingface_hub | Python | And any library that depends on it (e.g. transformers, diffusers, datasets, mlx, vllm β¦) |
hf-hub | Rust | |
swift-huggingface | Swift | |
@huggingface/hub | JavaScript | Node.js only |
HuggingFaceModelDownloader | Go | |
llama.cpp | C++ | Work in progress |
Cache location
The default cache directory is:
~/.cache/huggingface/hubThis can be overridden with environment variables:
HF_HUB_CACHE- direct path to the cache directory (takes priority)HF_HOME- path to the Hugging Face home directory; if set, the cache lives at$HF_HOME/hub
Overview
<CACHE_DIR>/
βββ .locks/ # Lock files for concurrent download safety
βββ models--<org>--<repo>/ # Cached model repositories
βββ datasets--<org>--<repo>/ # Cached dataset repositories
βββ spaces--<org>--<repo>/ # Cached space repositoriesEach downloaded repository gets a single flat folder. Inside each repo folder, files are stored once in a content-addressed blobs/ directory and accessed through snapshots/ symlinks. Named references (branches, tags) are tracked in refs/.
Schema
ββββββββββββββββββββββββββββββββββββββββββββ
β Repository folder β
β models--julien-c--EsperBERTo-small β
ββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββ¬ββββββββ΄ββββββββ¬βββββββββββββββ
β β β β
v v v v
βββββββββ ββββββββββ ββββββββββββββ ββββββββββββ
β refs/ β β blobs/ β β snapshots/ β β.no_exist/β
βββββ¬ββββ ββββββ¬ββββ ββββββββ¬ββββββ ββββββ¬ββββββ
β β β β
β β β β
"main" contains Files stored One folder Empty marker
commit hash by content per commit files for
e.g. "aaaaaa" hash (SHA-1 hash, e.g. files known
or SHA-256) aaaaaa/ not to exist
Resolves a bbbbbb/
branch/tag to β
a snapshot βββββββββββββββββββββββΊ Contains symlinks
to ../../blobs/{hash}Repository folder naming
Repositories are stored as flat directories at the cache root. The folder name encodes the repo type and repo ID:
{type}s--{repo_id_with_slashes_replaced_by_--}Rules:
- The repo type is pluralized:
models,datasets,spaces - Forward slashes (
/) in the repo ID are replaced with-- - The separator between all parts is
-- - Casing is preserved
Examples:
| Hub repo ID | Repo type | Cache folder name |
|---|---|---|
julien-c/EsperBERTo-small | model | models--julien-c--EsperBERTo-small |
huggingface/DataMeasurementsFiles | dataset | datasets--huggingface--DataMeasurementsFiles |
dalle-mini/dalle-mini | space | spaces--dalle-mini--dalle-mini |
Buckets are not handled by this cache as they are not git-backed. Use the dedicated
hf buckets synccommand instead.
Inside a repository folder
Every cached repository has the same internal structure:
<repo_folder>/
βββ blobs/
βββ refs/
βββ snapshots/
βββ .no_exist/ # may not always be presentblobs/ : content-addressed file storage
The blobs/ directory stores the actual file contents. Each file is named after its file etag on the Hub:
- Git-tracked files: named by their SHA-1 hash (40 hexadecimal characters)
- Git LFS files: named by their SHA-256 hash (64 hexadecimal characters)
This is a flat directory β no subdirectories. Identical files across different revisions are stored only once.
blobs/
βββ 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd # SHA-256 (LFS)
βββ 7cb18dc9bafbfcf74629a4b760af1b160957a83e # SHA-1 (git)
βββ d7edf6bd2a681fb0175f7735299831ee1b22b812 # SHA-1 (git)refs/ : branch and tag references
The refs/ directory maps human-readable references (branch names, tags, PR numbers) to commit hashes.
Each reference is a plain text file containing a single line: the full commit hash (40 hexadecimal characters).
refs/
βββ main # contains e.g. "bbc77c8132af1cc5cf678da3f1ddf2de43606d48"
βββ 2.4.0 # a tag
βββ refs/
βββ pr/
βββ 1 # pull request referenceWhen a file is downloaded using a branch or tag name, the corresponding ref file is created or updated with the latest commit hash.
snapshots/ : revision views
The snapshots/ directory contains one subdirectory per cached revision (commit hash). Each revision directory mirrors the file structure of the repository on the Hub, but files are symlinks pointing into ../../blobs/{hash}.
snapshots/
βββ 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
β βββ README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
β βββ pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
βββ bbc77c8132af1cc5cf678da3f1ddf2de43606d48/
βββ README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
βββ pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bdKey properties:
- Symlinks use relative paths:
../../blobs/{hash} - If a file is unchanged between two revisions, both symlinks point to the same blob with no data duplication
- Files in subdirectories on the Hub are represented as subdirectories in the snapshot (the full relative path is preserved)
Switching between snapshots is similar to using git checkout in a local git repository.
.no_exist/ : non-existence cache
The .no_exist/ directory tracks files that were requested but do not exist on the Hub. This avoids repeated HTTP requests for optional files.
Structure mirrors snapshots/: one subdirectory per commit hash, containing empty files (not symlinks) named after the missing file.
.no_exist/
βββ 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
βββ config_that_does_not_exist.json # empty fileDisk usage is negligible since these are only empty marker files.
Lock files
Lock files prevent concurrent processes from downloading the same blob simultaneously. They are stored in a .locks/ directory at the cache root (not inside the repo folder):
<CACHE_DIR>/.locks/<repo_folder_name>/<blob_hash>.lockExample:
<CACHE_DIR>/.locks/models--julien-c--EsperBERTo-small/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd.lockFull example
~/.cache/huggingface/hub/
βββ .locks/
β βββ models--julien-c--EsperBERTo-small/
β βββ 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd.lock
β
βββ models--julien-c--EsperBERTo-small/
βββ blobs/
β βββ [321M] 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
β βββ [ 398] 7cb18dc9bafbfcf74629a4b760af1b160957a83e
β βββ [1.4K] d7edf6bd2a681fb0175f7735299831ee1b22b812
β
βββ refs/
β βββ main # contains "bbc77c8132af1cc5cf678da3f1ddf2de43606d48"
β
βββ snapshots/
β βββ 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
β β βββ README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
β β βββ pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
β β
β βββ bbc77c8132af1cc5cf678da3f1ddf2de43606d48/
β βββ README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
β βββ pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
β
βββ .no_exist/
βββ 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
βββ optional_config.json # empty fileNote how pytorch_model.bin points to the same blob in both revisions. The 321 MB file is stored only once on disk.
File resolution logic
To locate a cached file on disk:
Resolve the revision to a commit hash
- If the revision is already a 40-character hex string, use it directly
- Otherwise, read the file at
refs/{revision}to get the commit hash
Check the snapshot
- Look for
snapshots/{commit_hash}/{relative_path} - If it exists (as a symlink or file), the file is cached. Follow the symlink to get the content
- Look for
Check non-existence
- Look for
.no_exist/{commit_hash}/{relative_path} - If it exists, the file is known not to exist on the Hub for this revision
- Look for
Cache miss
- If neither path exists, the file has not been cached yet
Windows behavior
The cache relies on symbolic links. On Windows systems where symlinks are not available, the cache operates in a degraded mode: actual file copies are placed directly in snapshots/ instead of symlinks. The blobs/ directory is not used in this mode.
This means the same file content may be duplicated across revisions, increasing disk usage. To enable symlink support on Windows, activate Developer Mode or run as administrator.
Update on GitHub