Input
image raw path
question raw
options raw
question_with_extra (final prompt to model) raw
extra_info (auxiliary info: bbox coords, etc.) raw
system suffix (assistant_prompt) from test_qwen.py
Output
ground-truth answer raw
expected response format conventional
Single letter choice (A/B/C/D) corresponding to one of the options above.
evaluation from paper
Multiple-choice accuracy · correct iff predicted letter == ground-truth letter
Raw record from SpatialScore_benchmark.ndjson · entry as-is