Text Generation
PEFT
Safetensors
English
instruction-tuning
qlora
code-llama
conversational

Model Card for codellama-7b-matplotlib-assistant

This model is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project.

Model Details

Model Description

The codellama-7b-matplotlib-assistant model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities.

  • Developed by: mingyue0101
  • Model type: Causal Language Model (Fine-tuned with PEFT/LoRA)
  • Language(s) (NLP): English, Chinese
  • License: Apache-2.0 (inherited from CodeLlama)
  • Finetuned from model: codellama/CodeLlama-7b-Instruct-hf

Model Sources

Uses

Direct Use

The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required.

Out-of-Scope Use

The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model.

Bias, Risks, and Limitations

This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (parquet02), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks.

Recommendations

Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use.

How to Get Started with the Model

Use the code below to load the model in 4-bit precision:

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

# ==========================================
# 1. Global Parameter Configuration
# ==========================================
base_model = "codeparrot/codeparrot"          # Base model ID on Hugging Face
new_dataset = "mingyue0101/prompts_modi"      # Fine-tuning dataset ID
new_model = "codeparrot_ming03"              # Directory name for saving the fine-tuned model

# ==========================================
# 2. Dataset Loading
# ==========================================
dataset = load_dataset(new_dataset, split="train")

# ==========================================
# 3. QLoRA 4-bit Quantization Configuration
# ==========================================
compute_dtype = getattr(torch, "float16")
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,                        # Enable 4-bit quantization storage
    bnb_4bit_quant_type="nf4",                # Use NormalFloat4 for better precision than FP4
    bnb_4bit_compute_dtype=compute_dtype,     # Cast to Float16 during matrix multiplication
    bnb_4bit_use_double_quant=False,          # Disable double quantization
)

# ==========================================
# 4. Load Base Model with Optimizations
# ==========================================
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}                       # Force load the model onto the first GPU (GPU 0)
)
model.config.use_cache = False                # Must disable KV cache during training to avoid backprop conflicts
model.config.pretraining_tp = 1               # Set tensor parallelism to 1 for single-GPU training

# ==========================================
# 5. Tokenizer Configuration & Alignment
# ==========================================
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token     # Causal LMs usually have no pad_token; reuse eos_token
tokenizer.padding_side = "right"              # Pad on the right to maintain proper causal attention masks

# ==========================================
# 6. PEFT (Lora) Adapter Hyperparameters
# ==========================================
peft_params = LoraConfig(
    r=64,                                     # LoRA rank, controlling the number of trainable parameters
    lora_alpha=16,                            # Scaling factor for LoRA weights
    lora_dropout=0.1,                         # Dropout probability to prevent overfitting in the adapter
    bias="none",                              # Do not train bias parameters
    task_type="CAUSAL_LM",                    # Explicitly declare the task type as Causal LM
    fan_in_fan_out="True"
)

# ==========================================
# 7. Training Arguments
# ==========================================
training_params = TrainingArguments(
    output_dir="./results",                   # Output directory for checkpoints and logs
    num_train_epochs=1,                       # Number of training epochs
    per_device_train_batch_size=4,            # Batch size per device during training
    gradient_accumulation_steps=1,            # Number of updates steps to accumulate gradients
    optim="paged_adamw_32bit",                # Use QLoRA paged optimizer to prevent Out-Of-Memory (OOM)
    save_steps=25,                            # Save checkpoint every 25 steps
    logging_steps=25,                         # Log training metrics every 25 steps
    learning_rate=2e-4,                       # Initial learning rate
    weight_decay=0.001,                       # Weight decay coefficient
    fp16=False,                               # Disable standard fp16 (handled by the quantization kernel)
    bf16=False,
    max_grad_norm=0.3,                        # Max gradient norm for gradient clipping
    max_steps=-1,                             # Rely on epochs instead of max_steps to control training length
    warmup_ratio=0.03,                        # Linear warmup ratio over training steps
    group_by_length=True,                     # Group sequences of similar lengths into batches to speed up training
    lr_scheduler_type="constant",             # Learning rate schedule type
    report_to="tensorboard"                   # Use TensorBoard to log training progress
)

# ==========================================
# 8. Start Supervised Fine-Tuning (SFT) & Save
# ==========================================
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="column0",             # Name of the column containing text data in the dataset
    max_seq_length=None,                      # Use default maximum sequence length
    tokenizer=tokenizer,
    args=training_params,
    packing=False,                            # Disable sample packing (combining multiple examples into one sequence)
)

# Launch the training process
trainer.train()

# Save the trained LoRA adapter weights and tokenizer files
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)
print(f"Training complete! Finetuned weights successfully saved to: {new_model}")

Training Details

Training Data

The model was trained on the mingyue0101/parquet02 dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT).

Training Procedure

Training Hyperparameters

  • Training regime: QLoRA 4-bit (NF4) mixed precision (fp16)
  • Learning rate: 2e-4
  • Optimizer: paged_adamw_32bit
  • Batch size: 4
  • Epochs: 1
  • LoRA Rank (r): 64
  • LoRA Alpha: 16
  • LoRA Dropout: 0.1
  • LR Scheduler: constant
  • Warmup Ratio: 0.03

Technical Specifications

Model Architecture and Objective

Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective.

Compute Infrastructure

Software

  • PEFT 0.10.0
  • Transformers
  • Bitsandbytes
  • TRL (SFTTrainer)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mingyue0101/codellama-7b-matplotlib-assistant

Adapter
(410)
this model

Datasets used to train mingyue0101/codellama-7b-matplotlib-assistant

Space using mingyue0101/codellama-7b-matplotlib-assistant 1