Assessing Spatial Comprehension on Maps in Pre-trained AI Models

Map reasoning is an intuitive skill for humans and a fundamental skill with important applications in many domains. In our blog post, we evaluate the capabilities of contemporary state-of-the-art Large Vision-Language Models (LVLMs) for reasoning on maps and comparing their capabilities with human participants on the coregistration task. We additionally propose and release a novel dataset to serve as an initial benchmark for map reasoning capabilities. We run an extensive analysis on the performance of open-source LVLMs showing that they struggle to achieve good performance on our dataset. Finally, we show that coregistration is intuitive to human participants who were able to reach close to perfect accuracy in a time-constrained manner.