Achieving robust navigation and open-world understanding remains one of the central challenges for deploying autonomous systems in complex, dynamic, and unstructured environments. While individual sensing modalities such as cameras, LiDAR, or inertial measurements have enabled significant advances in SLAM and perception, they face inherent limitations under adverse conditions and diverse scenarios. To move beyond these constraints, the robotics community is increasingly exploring the integration of multiple sensing modalities—including but not limited to vision, LiDAR, inertial, radar, audio, and tactile signals—combined with emerging advances in learning-based estimation, neural implicit mapping, and foundation models. This workshop will bring together researchers from robotics, computer vision, and AI to discuss new methods for multi-modal fusion, scalable and robust SLAM, open-vocabulary 3D scene understanding, and human-robot interaction in open-world contexts. By bridging classical state estimation with modern learning approaches, the workshop aims to identify the critical hurdles to robust navigation, advance the frontiers of open-world spatial intelligence, and chart a roadmap for generalizable autonomy across sensing, tasks, and domains.
We invite researchers and practitioners from robotics, computer vision, and AI to share advances in spatial intelligence—preliminary results, bold ideas, and lessons from industry. Accepted works will be featured in spotlight talks and posters, and all papers will appear on the workshop site. Authors retain IP. We may also organize a journal special issue for selected top papers.