Haruka Takahashi, Yoshihiro Kanamori and Yuki Endo
University of Tsukuba
Computer Animation and Virtual Worlds (Computer Graphics International 2022)
This paper presents the first technique to estimate a 3D terrain model from a single landscape image. Although monocular depth estimation also offers single-image 3D reconstruction, it assigns depth only to pixels visible in the input image, resulting in an incomplete 3D terrain output. Our method generates a complete 3D terrain model as a textured height map via a three-stage framework using deep neural networks. First, to exploit the performance of pixel-aligned estimation, we estimate terrain's per-pixel depth and color free from shadows or lights in the perspective view. Second, we triangulate the RGB-D data generated in the first stage and rasterize the triangular mesh from the top view to obtain an incomplete textured height map. Finally, we inpaint the depth and color in the missing regions. Because there are many possible ways to complete the missing regions, we synthesize diverse shapes and textures during inpainting using a variational autoencoder. Qualitative and quantitative experiments reveal that our method outperforms existing methods applying a direct perspective-to-top view transform as image-to-image translation.
Last modified: July 2022
[back]