ST-3DView: Multi-Scale Contrast-Enhanced 3D Point Cloud Reconstruction of Single-View Objects From Video Scene Transition

บทความในวารสาร

ผู้เขียน/บรรณาธิการ

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Dipanita Chakraborty, Werapon Chiracharit, Kosin Chamnongthai

ผู้เผยแพร่: Institute of Electrical and Electronics Engineers

ปีที่เผยแพร่ (ค.ศ.): 2025

วารสาร: IEEE Access (2169-3536)

Volume number: 13

หน้าแรก: 69596

หน้าสุดท้าย: 69618

จำนวนหน้า: 23

นอก: 2169-3536

eISSN: 2169-3536

ภาษา: English-United States (EN-US)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

3D object tracking in monocular video relies on understanding the scene content to improve the continuity of the tracking signal. Reconstructing 3D shapes of single-view objects is essential for capturing object depth, orientation, and position within the scene. While existing deep learning-based methods excel in 3D reconstruction and tracking, they primarily focus on object feature semantics in normal frames, neglecting scene transition (ST) frames. This limitation leads to object information loss and discontinuity during tracking. This paper proposes a novel method for 3D reconstruction of single-view objects in monocular video scenes, focusing on fade scene transitions. First, large video datasets are pre-processed and segmented into sequences using cut transition detection via adaptive histogram equalization (AHE), and Euclidean distance estimation (EDE). Second, fade transition sequences are detected and classified into fade-in, fade-out, and mixed-fade scene transitions using pixel intensity-based adaptive threshold. Third, contrast enhancement is applied to fade transition frames using contrast-limited adaptive histogram equalization (CLAHE) to improve object feature extraction. Fourth, a modified DeepLabv3+ network is employed to generate multi-scale features for semantic foreground object and background segmentation. Finally, the segmented objects are processed through the proposed Point-wise multilayer perceptron (MLP) network, which reconstructs 3D object point clouds from segmented 2D single-view object pixels. Experimental evaluations on object categories “Chair,” “Car,” and “Airplane” from the benchmark TRECVID, Pix3D, ShapeNet, and Multimedia datasets achieved an accuracy improvement of 6.52% for fade transition detection and satisfactory results in 3D point cloud reconstruction.

คำสำคัญ

3-Dimension model, image segmentation