Informatik
Refine
Year of publication
- 2023 (2) (remove)
Document Type
- Doctoral Thesis (2)
Language
- English (2) (remove)
Has full text
- yes (2)
Is part of the Bibliography
- yes (2)
Institute
- Informatik (2)
Publisher
- Universität Tübingen (2) (remove)
Human recognition is an important part of perception systems, such as those used in autonomous vehicles or robots. These systems often use deep neural networks for this purpose, which rely on large amounts of data that ideally cover various situations, movements, visual appearances, and interactions. However, obtaining such data is typically complex and expensive. In addition to raw data, labels are required to create training data for supervised learning. Thus, manual annotation of bounding boxes, keypoints, orientations, or actions performed is frequently necessary. This work addresses whether the laborious acquisition and creation of data can be simplified through targeted simulation. If data are generated in a simulation, information such as positions, dimensions, orientations, surfaces, and occlusions are already known, and appropriate labels can be generated automatically. A key question is whether deep neural networks, trained with simulated data, can be applied to real data. This work explores the use of simulated training data using examples from the field of pedestrian detection for autonomous vehicles. On the one hand, it is shown how existing systems can be improved by targeted retraining with simulation data, for example to better recognize corner cases. On the other hand, the work focuses on the generation of data that hardly or not occur at all in real standard datasets. It will be demonstrated how training data can be generated by targeted acquisition and combination of motion data and 3D models, which contain finely graded action labels to recognize even complex pedestrian situations. Through the diverse annotation data that simulations provide, it becomes possible to train deep neural networks for a wide variety of tasks with one dataset. In this work, such simulated data is used to train a novel deep multitask network that brings together diverse, previously mostly independently considered but related, tasks such as 2D and 3D human pose recognition and body and orientation estimation.
In modern collaborative production environments where industrial robots and humans are supposed to work hand in hand, it is mandatory to observe the robot’s workspace at all times. Such observation is even more crucial when the robot’s main position is also dynamic e.g. because the system is mounted on a movable platform. As current solutions like physically secured areas in which a robot can perform actions potentially dangerous for humans, become unfeasible in such scenarios, novel, more dynamic, and situation aware safety solutions need to be developed and deployed.
This thesis mainly contributes to the bigger picture of such a collaborative scenario by presenting a data-driven convolutional neural network-based approach to estimate the two-dimensional kinematic-chain configuration of industrial robot-arms within raw camera images. This thesis also provides the information needed to generate and organize the mandatory data basis and presents frameworks that were used to realize all involved subsystems. The robot-arm’s extracted kinematic-chain can also be used to estimate the extrinsic camera parameters relative to the robot’s three-dimensional origin. Further a tracking system, based on a two-dimensional kinematic chain descriptor is presented to allow for an accumulation of a proper movement history which enables the prediction of future target positions within the given image plane. The combination of the extracted robot’s pose with a simultaneous human pose estimation system delivers a consistent data flow that can be used in higher-level applications.
This thesis also provides a detailed evaluation of all involved subsystems and provides a broad overview of their particular performance, based on novel generated, semi automatically annotated, real datasets.