Camera Focal Length Prediction for Neural Novel View Synthesis from Monocular Video

Conference proceedings article

Authors/Editors

Strategic Research Themes

Computational Science and Engineering (Digital Transformation)

Publication Details

Author list: Dipanita Chakraborty, Werapon Chiracharit, Kosin Chamnongthai, and Minoru Okada

Publication year: 2024

Title of series: The 16th APSIPA Annual Summit and Conference 2024 (APSIPA ASC 2024)

Start page: 1

End page: 5

Number of pages: 5

Languages: English-United States (EN-US)

Abstract

Novel view synthesis is a challenging task that generates multi-view images from a single-view object by reconstructing the depth and spatial information between the camera and the object. This task specifically facilitates the rendering of 2D objects into 3D representations from monocular video scenes. Existing methods suffer from depth information loss between the camera and object when provided with limited single-view input images, resulting in poor reconstruction accuracy in 3D space. Moreover, they lack a high-level depth feature map representation of scene information. Therefore, we propose a multilayer encoderdecoder architecture-based network that efficiently predicts the focal length between the object and camera from a mono-view image. Additionally, we utilize a combined feature extraction strategy from the estimated depth feature map and RGB input image to synthesize novel views. While the encoder network extracts semantic high-level features at multiple scales, the decoder network refines these combined features for synthesis. Our method effectively improves depth information retention while achieving good reconstruction performance. Experimental results evaluated on a benchmark dataset demonstrate the efficacy of our proposed method.

Keywords

Artificial Intelligence