TY  - CHAP
U1  - Konferenzveröffentlichung
A1  - Schneider, Lukas
A1  - Jasch, Manuel
A1  - Fröhlich, Björn
A1  - Weber, Thomas
A1  - Franke, Uwe
A1  - Pollefeys, Marc
A1  - Rätsch, Matthias
ED  - Sharma, Puneet
T1  - Multimodal neural networks: RGB-D for semantic segmentation and object detection
T2  - Image analysis : 20th Scandinavian conference, SCIA 2017, Tromsø, Norway, June 12-14, 2017, proceedings, part I. - (Lecture notes in computer science ; 10269)
N2  - This paper presents a novel multi-modal CNN architecture that exploits complementary input cues in addition to sole color information. The joint model implements a mid-level fusion that allows the network to exploit cross modal interdependencies already on a medium feature-level. The benefit of the presented architecture is shown for the RGB-D image understanding task. So far, state-of-the-art RGB-D CNNs have used network weights trained on color data. In contrast, a superior initialization scheme is proposed to pre-train the depth branch of the multi-modal CNN independently. In an end-to-end training the network parameters are optimized jointly using the challenging Cityscapes dataset. In thorough experiments, the effectiveness of the proposed model is shown. Both, the RGB GoogLeNet and further RGB-D baselines are outperformed with a significant margin on two different tasks: semantic segmentation and object detection. For the latter, this paper shows how to extract object level groundtruth from the instance level annotations in Cityscapes in order to train a powerful object detector.
Y1  - 2017
SN  - 978-3-319-59126-1
SB  - 978-3-319-59126-1
U6  - https://doi.org/10.1007/978-3-319-59126-1_9
DO  - https://doi.org/10.1007/978-3-319-59126-1_9
SP  - 98
EP  - 109
S1  - 12
PB  - Springer
CY  - Cham
ER  -