YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

VLM

Codebase of VLM projects

Evaluation

Currently, the codebase supports evaluation on several benchmarks, including HallusionBench, ai2d, docvqa, mmbench, mme, mmstar, ocrvqa, pope, seed_bench, sqa, textvqa, and vqav2. You can modify the configuration in the config file to enable evaluation.

Config

Please refer to llava_test.py or omg_llava_test.py.

  1. Firstly, you need load the evaluation benchmarks from here. And put them to ./data/.

  2. Copy the train config of your model and delete the custom_hooks.

# remove custom_hooks
custom_hooks = []
  1. Implement the preparing_for_generation and predict_forward for your model. Please refer to llava or omg_llava.

preparing_for_generation set the generation setting for the model such as template. predict_forward is the predict forward function of your method, the input is items from the test dataset (such as pixel_values and text_prompts), the output is the response dict.

  1. Add these items in your config.
test_dataset = [
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/mmbench/MMBench_DEV_EN.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/mmbench/MMBench_TEST_EN.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MMEDataset,
        data_file='./data/eval/mme/MME.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/seed_bench/SEEDBench_IMG.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/sqa/ScienceQA_VAL.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/sqa/ScienceQA_TEST.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/ai2d/AI2D_TEST.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=MultipleChoiceDataset,
        data_file='./data/eval/mmstar/MMStar.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=HallusionDataset,
        data_file='./data/eval/HallusionBench/HallusionBench.tsv',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
    dict(
        type=POPEDataset,
        data_file=[
            './data/eval/pope/coco_pope_adversarial.json',
            './data/eval/pope/coco_pope_popular.json',
            './data/eval/pope/coco_pope_random.json',
        ],
        coco_val_path='./data/eval/val2014/',
        image_processor=image_processor,
        pad_image_to_square=True,
    ),
]

test_dataloader = dict(
    batch_size=1,
    num_workers=0,
    drop_last=False,
    sampler=dict(type=DefaultSampler, shuffle=False),
    dataset=dict(type=ConcatDataset, datasets=test_dataset),
)
test_evaluator = dict()
test_cfg = dict(type=TestLoop, select_metric='first')
  1. Perform test.
# example 
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 PYTHONPATH=. bash tools/dist.sh test projects/omg_llava/configs/test/omg_llava_7b_finetune_8gpus.py 8 --checkpoint ./pretrained/omg_llava/omg_llava_fintune_8gpus.pth
model MMbench-DEV-EN SEEDBench MME ScienceQA_VAL ScienceQA_TEST AI2D MMStar
llava-vicuna-7b 68.5 65.9 1689 67.6 68.9 56.7 34.8
omg-llava-internlm2-7b 45.7 54.2 1255 53.5 55.6 42.3 34.8
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support