YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
VLM
Codebase of VLM projects
Evaluation
Currently, the codebase supports evaluation on several benchmarks, including HallusionBench, ai2d, docvqa, mmbench, mme, mmstar, ocrvqa, pope, seed_bench, sqa, textvqa, and vqav2. You can modify the configuration in the config file to enable evaluation.
Config
Please refer to llava_test.py or omg_llava_test.py.
Firstly, you need load the evaluation benchmarks from here. And put them to
./data/.Copy the train config of your model and delete the custom_hooks.
# remove custom_hooks
custom_hooks = []
- Implement the preparing_for_generation and predict_forward for your model. Please refer to llava or omg_llava.
preparing_for_generation set the generation setting for the model such as template. predict_forward is the predict forward function of your method, the input is items from the test dataset (such as pixel_values and text_prompts), the output is the response dict.
- Add these items in your config.
test_dataset = [
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/mmbench/MMBench_DEV_EN.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/mmbench/MMBench_TEST_EN.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MMEDataset,
data_file='./data/eval/mme/MME.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/seed_bench/SEEDBench_IMG.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/sqa/ScienceQA_VAL.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/sqa/ScienceQA_TEST.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/ai2d/AI2D_TEST.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=MultipleChoiceDataset,
data_file='./data/eval/mmstar/MMStar.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=HallusionDataset,
data_file='./data/eval/HallusionBench/HallusionBench.tsv',
image_processor=image_processor,
pad_image_to_square=True,
),
dict(
type=POPEDataset,
data_file=[
'./data/eval/pope/coco_pope_adversarial.json',
'./data/eval/pope/coco_pope_popular.json',
'./data/eval/pope/coco_pope_random.json',
],
coco_val_path='./data/eval/val2014/',
image_processor=image_processor,
pad_image_to_square=True,
),
]
test_dataloader = dict(
batch_size=1,
num_workers=0,
drop_last=False,
sampler=dict(type=DefaultSampler, shuffle=False),
dataset=dict(type=ConcatDataset, datasets=test_dataset),
)
test_evaluator = dict()
test_cfg = dict(type=TestLoop, select_metric='first')
- Perform test.
# example
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 PYTHONPATH=. bash tools/dist.sh test projects/omg_llava/configs/test/omg_llava_7b_finetune_8gpus.py 8 --checkpoint ./pretrained/omg_llava/omg_llava_fintune_8gpus.pth
| model | MMbench-DEV-EN | SEEDBench | MME | ScienceQA_VAL | ScienceQA_TEST | AI2D | MMStar |
|---|---|---|---|---|---|---|---|
| llava-vicuna-7b | 68.5 | 65.9 | 1689 | 67.6 | 68.9 | 56.7 | 34.8 |
| omg-llava-internlm2-7b | 45.7 | 54.2 | 1255 | 53.5 | 55.6 | 42.3 | 34.8 |