Input
image
prompt (referring expression)
suffix (answer-format instruction)
Output
ground-truth mask raw
expected model output derived from mask
evaluation from paper
point-in-mask · prediction is correct if (x, y) ∈ ground-truth mask
Raw record from question.json / question-2.json · entry as-is