OPUS 4 | 006 Spezielle Computerverfahren

Efficient and robust 3D object reconstruction based on monocular SLAM and CNN semantic segmentation (2019)

Weber, Thomas ; Triputen, Sergey ; Gopal, Atmaraaj ; Eißler, Steffen ; Höfert, Christian ; Schreve, Kristiaan ; Rätsch, Matthias

Various applications implement slam technology, especially in the field of robot navigation. We show the advantage of slam technology for independent 3d object reconstruction. To receive a point cloud of every object of interest void of its environment, we leverage deep learning. We utilize recent cnn deep learning research for accurate semantic segmentation of objects. In this work, we propose two fusion methods for cnn-based semantic segmentation and slam for the 3d reconstruction of objects of interest in order to obtain a more robustness and efficiency. As a major novelty, we introduce a cnn-based masking to focus slam only on feature points belonging to every single object. Noisy, complex or even non-rigid features in the background are filtered out, improving the estimation of the camera pose and the 3d point cloud of each object. Our experiments are constrained to the reconstruction of industrial objects. We present an analysis of the accuracy and performance of each method and compare the two methods describing their pros and cons.

Follow Me: real-time in the wild person tracking application for autonomous robotics (2018)

Weber, Thomas ; Triputen, Sergey ; Danner, Michael ; Braun, Sascha ; Schreve, Kristiaan ; Rätsch, Matthias

In the last 20 years there have been major advances in autonomous robotics. In IoT (Industry 4.0), mobile robots require more intuitive interaction possibilities with humans in order to expand its field of applications. This paper describes a user-friendly setup, which enables a person to lead the robot in an unknown environment. The environment has to be perceived by means of sensory input. For realizing a cost and resource efficient Follow Me application we use a single monocular camera as low-cost sensor. For efficient scaling of our Simultaneous Localization and Mapping (SLAM) algorithm, we integrate an inertial measurement unit (IMU) sensor. With the camera input we detect and track a person. We propose combining state of the art deep learning with Convolutional Neural Network (CNN) and SLAM algorithms functionality on the same input camera image. Based on the output robot navigation is possible. This work presents the specification, workflow for an efficient development of the Follow Me application. Our application’s delivered point clouds are also used for surface construction. For demonstration, we use our platform SCITOS G5 equipped with the afore mentioned sensors. Preliminary tests show the system works robustly in the wild.

Clustering of Human gait with Parkinson’s disease by using Dynamic Time Warping (2018)

Steinmetzer, Tobias ; Bönninger, Ingrid ; Priwitzer, Barbara ; Reinhardt, Fritjof ; Reckhardt, Markus Christoph ; Erk, Dorela

We present a new method for detecting gait disorders according to their stadium using cluster methods for sensor data. 21 healthy and 18 Parkinson subjects performed the time up and go test. The time series were segmented into separate steps. For the analysis the horizontal acceleration measured by a mobile sensor system was considered. We used dynamic time warping and hierarchical custering to distinguish the stadiums. A specificity of 92% was achieved.

Multimodal neural networks: RGB-D for semantic segmentation and object detection (2017)

Schneider, Lukas ; Jasch, Manuel ; Fröhlich, Björn ; Weber, Thomas ; Franke, Uwe ; Pollefeys, Marc ; Rätsch, Matthias

This paper presents a novel multi-modal CNN architecture that exploits complementary input cues in addition to sole color information. The joint model implements a mid-level fusion that allows the network to exploit cross modal interdependencies already on a medium feature-level. The benefit of the presented architecture is shown for the RGB-D image understanding task. So far, state-of-the-art RGB-D CNNs have used network weights trained on color data. In contrast, a superior initialization scheme is proposed to pre-train the depth branch of the multi-modal CNN independently. In an end-to-end training the network parameters are optimized jointly using the challenging Cityscapes dataset. In thorough experiments, the effectiveness of the proposed model is shown. Both, the RGB GoogLeNet and further RGB-D baselines are outperformed with a significant margin on two different tasks: semantic segmentation and object detection. For the latter, this paper shows how to extract object level groundtruth from the instance level annotations in Cityscapes in order to train a powerful object detector.

A 3D face modelling approach for pose-invariant face recognition in a human-robot environment (2017)

Grupp, Michael ; Kopp, Philipp ; Huber, Patrik ; Rätsch, Matthias

Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis.

Fitting 3D morphable face models using local features (2015)

Huber, Patrik ; Feng, Zhen-Hua ; Christmas, William ; Kittler, Josef ; Rätsch, Matthias

In this paper, we propose a novel fitting method that uses local image features to fit a 3D morphable face model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on morphable model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a 3D morphable model. Because of the speed of our method, it is applicable for realtime applications. Our cascaded regression framework is available as an open source library at github.com/patrikhuber/ superviseddescent.

Open Access

006 Spezielle Computerverfahren

Refine

Author

Year of publication

Document Type

Language

Has full text

Is part of the Bibliography

Institute

Publisher

6 search hits