ST-3DView: Multi-Scale Contrast-Enhanced 3D Point Cloud Reconstruction of Single-View Objects From Video Scene Transition

Journal article


Authors/Editors


Strategic Research Themes


Publication Details

Author listDipanita Chakraborty, Werapon Chiracharit, Kosin Chamnongthai

PublisherInstitute of Electrical and Electronics Engineers

Publication year2025

JournalIEEE Access (2169-3536)

Volume number13

Start page69596

End page69618

Number of pages23

ISSN2169-3536

eISSN2169-3536

LanguagesEnglish-United States (EN-US)


View on publisher site


Abstract

3D object tracking in monocular video relies on understanding the scene content to improve the continuity of the tracking signal. Reconstructing 3D shapes of single-view objects is essential for capturing object depth, orientation, and position within the scene. While existing deep learning-based methods excel in 3D reconstruction and tracking, they primarily focus on object feature semantics in normal frames, neglecting scene transition (ST) frames. This limitation leads to object information loss and discontinuity during tracking. This paper proposes a novel method for 3D reconstruction of single-view objects in monocular video scenes, focusing on fade scene transitions. First, large video datasets are pre-processed and segmented into sequences using cut transition detection via adaptive histogram equalization (AHE), and Euclidean distance estimation (EDE). Second, fade transition sequences are detected and classified into fade-in, fade-out, and mixed-fade scene transitions using pixel intensity-based adaptive threshold. Third, contrast enhancement is applied to fade transition frames using contrast-limited adaptive histogram equalization (CLAHE) to improve object feature extraction. Fourth, a modified DeepLabv3+ network is employed to generate multi-scale features for semantic foreground object and background segmentation. Finally, the segmented objects are processed through the proposed Point-wise multilayer perceptron (MLP) network, which reconstructs 3D object point clouds from segmented 2D single-view object pixels. Experimental evaluations on object categories “Chair,” “Car,” and “Airplane” from the benchmark TRECVID, Pix3D, ShapeNet, and Multimedia datasets achieved an accuracy improvement of 6.52% for fade transition detection and satisfactory results in 3D point cloud reconstruction.


Keywords

3-Dimension modelimage segmentation


Last updated on 2025-29-04 at 00:00