ST-3DView: Multi-Scale Contrast-Enhanced 3D Point Cloud Reconstruction of Single-View Objects From Video Scene Transition

Journal article

Authors/Editors

Strategic Research Themes

Publication Details

Author list: Dipanita Chakraborty, Werapon Chiracharit, Kosin Chamnongthai

Publisher: Institute of Electrical and Electronics Engineers

Publication year: 2025

Journal: IEEE Access (2169-3536)

Volume number: 13

Start page: 69596

End page: 69618

Number of pages: 23

ISSN: 2169-3536

eISSN: 2169-3536

Languages: English-United States (EN-US)

View on publisher site

Abstract

3D object tracking in monocular video relies on understanding the scene content to improve the continuity of the tracking signal. Reconstructing 3D shapes of single-view objects is essential for capturing object depth, orientation, and position within the scene. While existing deep learning-based methods excel in 3D reconstruction and tracking, they primarily focus on object feature semantics in normal frames, neglecting scene transition (ST) frames. This limitation leads to object information loss and discontinuity during tracking. This paper proposes a novel method for 3D reconstruction of single-view objects in monocular video scenes, focusing on fade scene transitions. First, large video datasets are pre-processed and segmented into sequences using cut transition detection via adaptive histogram equalization (AHE), and Euclidean distance estimation (EDE). Second, fade transition sequences are detected and classified into fade-in, fade-out, and mixed-fade scene transitions using pixel intensity-based adaptive threshold. Third, contrast enhancement is applied to fade transition frames using contrast-limited adaptive histogram equalization (CLAHE) to improve object feature extraction. Fourth, a modified DeepLabv3+ network is employed to generate multi-scale features for semantic foreground object and background segmentation. Finally, the segmented objects are processed through the proposed Point-wise multilayer perceptron (MLP) network, which reconstructs 3D object point clouds from segmented 2D single-view object pixels. Experimental evaluations on object categories “Chair,” “Car,” and “Airplane” from the benchmark TRECVID, Pix3D, ShapeNet, and Multimedia datasets achieved an accuracy improvement of 6.52% for fade transition detection and satisfactory results in 3D point cloud reconstruction.

Keywords

3-Dimension model, image segmentation