SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation

arXiv HF Checkpoint HF Dataset License
Wei Tang1โ€ƒ Xuejing Liuโœ‰,2โ€ƒ Yanpeng Sun3โ€ƒ Zechao Liโœ‰,1
1Nanjing University of Science and Technology;โ€ƒ 2Institute of Computing Technology, Chinese Academy of Sciences;โ€ƒ 3NExT++ Lab, National University of Singapore
โœ‰ Corresponding Authors

Overview

This repository provides the codebase of SSP-SAM, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts.

Current repo status:

  • Training/testing/data processing scripts are available.
  • Multiple dataset configs are provided under configs/.

๐Ÿ’ฅ News

  • 17 Mar, 2026: Open-source codebase has been organized and released.
  • 4 Dec, 2025: SSP-SAM paper accepted by IEEE TCSVT.

๐Ÿ“Œ ToDo

  • Release final model checkpoints on Hugging Face
  • Release processed training/evaluation metadata
  • Release arXiv version

๐Ÿ”— Model Zoo & Links

  • Paper: https://arxiv.org/abs/xxxx.xxxxx
  • HF Hugging Face Checkpoints/datasets: https://huggingface.co/wayneicloud/SSP-SAM

๐Ÿ“ Project Structure

.
โ”œโ”€โ”€ configs/                 # training/evaluation configs
โ”œโ”€โ”€ data_seg/                # data preprocessing scripts and generated anns/masks
โ”œโ”€โ”€ datasets/                # dataloader and transforms
โ”œโ”€โ”€ models/                  # SSP_SAM model definitions
โ”œโ”€โ”€ segment-anything/        # modified SAM dependency (editable install)
โ”œโ”€โ”€ train.py                 # training entry
โ”œโ”€โ”€ test.py                  # evaluation entry
โ”œโ”€โ”€ submit_train.sh          # train launcher (with examples)
โ””โ”€โ”€ submit_test.sh           # test launcher (with examples)

โš™๏ธ Environment Setup

Recommended: conda environment on macOS/Linux.

conda create -n ssp_sam python=3.10 -y
conda activate ssp_sam
pip install --upgrade pip

# 1) install PyTorch (CUDA example: cu121)
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121

# 2) install modified segment-anything first
cd segment-anything
pip install -e .
cd ..

# 3) install remaining dependencies
pip install -r requirements.txt

Note: the segment-anything code in this repository has been modified based on the original SAM implementation.
Please install the local segment-anything in editable mode (pip install -e .) as shown above.

๐Ÿงฉ Data Preparation

Please check:

  • data_seg/README.md
  • data_seg/run.sh

You have two options:

  1. Use our provided annotations + generate masks locally (recommended)

    • HF Download data_seg/anns/*.json and other prepared data_seg files from Hugging Face:
      https://huggingface.co/wayneicloud/SSP-SAM
    • You can directly use our data_seg/anns/*.json.
    • masks should be generated on your side by running:
      bash data_seg/run.sh
      
  2. Regenerate annotations/masks by yourself
    See the collapsible section below.

Generate Annotations/Masks by Yourself (click to expand)

References:

  • data_seg/README.md
  • data_seg/run.sh
  • legacy_data_prep_simrec.md (legacy reference for raw data preparation and sources)

Required raw annotation folders/files for generation include (examples):

  • data_seg/refcoco/
  • data_seg/refcoco+/
  • data_seg/refcocog/
  • data_seg/refclef/

Each folder should contain raw files such as instances.json and refs(...).p.

Minimal expected layout (example):

data_seg/
โ”œโ”€โ”€ refcoco/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ”œโ”€โ”€ refs(unc).p
โ”‚   โ””โ”€โ”€ refs(google).p
โ”œโ”€โ”€ refcoco+/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ””โ”€โ”€ refs(unc).p
โ”œโ”€โ”€ refcocog/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ”œโ”€โ”€ refs(google).p
โ”‚   โ””โ”€โ”€ refs(umd).p
โ””โ”€โ”€ refclef/
    โ”œโ”€โ”€ instances.json
    โ”œโ”€โ”€ refs(unc).p
    โ””โ”€โ”€ refs(berkeley).p

Example preprocessing command:

python ./data_seg/data_process.py \
  --data_root ./data_seg \
  --output_dir ./data_seg \
  --dataset refcoco \
  --split unc \
  --generate_mask

Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in data_seg/.
Please modify them according to your local environment before running. Also check dataset/image path settings in:

  • datasets/dataset.py

Important: in datasets/dataset.py, class VGDataset, you should update local paths for images/annotations/masks according to your machine.

Example local data organization:

your_project_root/
โ”œโ”€โ”€ data/                                        # set --data_root to this folder
โ”‚   โ”œโ”€โ”€ coco/
โ”‚   โ”‚   โ””โ”€โ”€ train2014/                           # COCO images (unc/unc+/gref/gref_umd/grefcoco)
โ”‚   โ”œโ”€โ”€ referit/
โ”‚   โ”‚   โ””โ”€โ”€ images/                              # ReferIt images
โ”‚   โ”œโ”€โ”€ VG/                                      # Visual Genome images (merge pretrain path)
โ”‚   โ””โ”€โ”€ vg/                                      # Visual Genome images (phrase_cut path, if used)
โ””โ”€โ”€ data_seg/                                    # same level as data/
    โ”œโ”€โ”€ anns/
    โ”‚   โ”œโ”€โ”€ refcoco.json
    โ”‚   โ”œโ”€โ”€ refcoco+.json
    โ”‚   โ”œโ”€โ”€ refcocog_umd.json
    โ”‚   โ”œโ”€โ”€ refclef.json
    โ”‚   โ””โ”€โ”€ grefcoco.json
    โ””โ”€โ”€ masks/
        โ”œโ”€โ”€ refcoco/
        โ”œโ”€โ”€ refcoco+/
        โ”œโ”€โ”€ refcocog_umd/
        โ”œโ”€โ”€ refclef/
        โ””โ”€โ”€ grefcoco/

For training/testing, use:

  • data_seg/anns/*.json (provided)
  • data_seg/masks/* (generated locally via bash data_seg/run.sh)

Required Images and Raw Data Sources

For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config).
Common sources:

๐Ÿš€ Training

Default training launcher:

bash submit_train.sh

submit_train.sh already includes commented examples for multiple datasets, e.g.:

  • refcoco
  • refcoco+
  • refcocog_umd
  • referit
  • grefcoco

You can also run directly:

torchrun --nproc_per_node=8 train.py \
  --config configs/SSP_SAM_CLIP_B_FT_unc.py \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt

Resume Modes

train.py supports two resume modes:

  • --resume <ckpt>: use this for interrupted training and continue from the previous checkpoint (ๆ–ญ็‚น็ปญ่ฎญ).
  • --resume_from_pretrain <ckpt>: use this for loading pretrained weights before fine-tuning/training.

๐Ÿ“Š Evaluation

Default testing launcher:

bash submit_test.sh

Example direct command:

torchrun --nproc_per_node=1 --master_port=29590 test.py \
  --config configs/SSP_SAM_CLIP_L_FT_unc.py \
  --test_split testB \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \
  --checkpoint output/your_save_folder/checkpoint_best_miou.pth

๐Ÿ“ Notes

  • COCO image path in visualization prioritizes data/coco/train2014.
  • Current mask prediction/evaluation path uses 512x512 mask space.
  • Config files in configs/ are set with:
    • output_dir='outputs/your_save_folder'
    • batch_size=8
    • freeze_epochs=20

๐ŸŒˆ Acknowledgements

This repository benefits from ideas and/or codebases of the following projects:

Thanks to the authors for their valuable open-source contributions.

๐Ÿ“š Citation

If you find this repository useful, please cite our SSP-SAM paper.

@article{ssp_sam_tcsvt,
  title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation},
  author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support