Token Warping Helps MLLMs Look from Nearby Viewpoints
Token warping — rearranging ViT image tokens rather than pixels — enables MLLMs to reason from nearby viewpoints without fine-tuning, consistently outperforming all baselines on the new ViewBench benchmark.