O3D-SIM creation starts with capturing posed RGB-D images and camera parameters.O3D-SIM creation starts with capturing posed RGB-D images and camera parameters.

3D Mapping Initialization: Using RGB-D Images and Camera Parameters

2025/12/15 04:00

Abstract and 1 Introduction

  1. Related Works

    2.1. Vision-and-Language Navigation

    2.2. Semantic Scene Understanding and Instance Segmentation

    2.3. 3D Scene Reconstruction

  2. Methodology

    3.1. Data Collection

    3.2. Open-set Semantic Information from Images

    3.3. Creating the Open-set 3D Representation

    3.4. Language-Guided Navigation

  3. Experiments

    4.1. Quantitative Evaluation

    4.2. Qualitative Results

  4. Conclusion and Future Work, Disclosure statement, and References

3.1. Data Collection

Creating the O3D-SIM begins by capturing a sequence of RGB-D images using a posed camera, with an estimate of the extrinsic and intrinsic parameters of the environment to be mapped. The pose information associated with each image is used to transform the point clouds to a world coordinate frame. For simulations, we use the groundtruth pose associated with each image, whereas we leverage RTAB-Map[30] with G2O optimization [31] in the real world to generate these poses.

\ Figure 2. An overview of the proposed 3D mapping pipeline. Labels generated by the RAM model are input into Grounding DINO to generate bounding boxes for the detected labels. Subsequently, instance masks are created using the SAM model, while CLIP and DINOv2 embeddings are extracted in parallel. These masks, along with the semantic embeddings, are back-projected into 3D space to identify 3D instances. These instances are then refined using a density-based clustering algorithm to produce the O3D-SIM.

\

:::info Authors:

(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;

(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;

(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;

(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Piyasa Fırsatı
DAR Open Network Logosu
DAR Open Network Fiyatı(D)
$0.01338
$0.01338$0.01338
-0.52%
USD
DAR Open Network (D) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.