Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
g023dev
g023
Follow
webxos's profile picture
1 follower
·
6 following
https://github.com/g023
g023
AI & ML interests
ai datasets, ai training,ai software
Recent Activity
updated
a model
about 2 hours ago
g023/Qwen3-1.77B-g023
upvoted
a
paper
1 day ago
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
reacted
to
akhaliq
's
post
with 🤯
1 day ago
GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection https://huggingface.co/papers/2403.03507 Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
View all activity
Organizations
None yet
g023
's models
7
Sort: Recently updated
g023/Qwen3-1.77B-g023
Text Generation
•
2B
•
Updated
about 2 hours ago
•
302
g023/Qwen3-1.77B-g023-GGUF
Text Generation
•
2B
•
Updated
2 days ago
•
937
g023/Qwen3-1.77B-g023-NF4
Text Generation
•
2B
•
Updated
3 days ago
•
106
g023/qwen3-tiny-v2-finetuned
Text Generation
•
2B
•
Updated
4 days ago
•
49
g023/qwen3-tiny-v2
Text Generation
•
2B
•
Updated
7 days ago
•
60
g023/qwen3-tiny-v1
Text Generation
•
2B
•
Updated
7 days ago
•
41
g023/Qwen3-8B-DMS-8x-4bit-NF4
Text Generation
•
8B
•
Updated
Jan 31
•
126
•
1