Refine
Document Type
- Conference proceeding (11) (remove)
Language
- English (11)
Has full text
- yes (11)
Is part of the Bibliography
- yes (11)
Institute
- Informatik (11)
Publisher
- IEEE (11) (remove)
On the way to achieving higher degrees of autonomy for vehicles in complicated, ever changing scenarios, the localization problem poses a very important role. Especially the Simultaneous Localization and Mapping (SLAM) problem has been studied greatly in the past. For an autonomous system in the real world, we present a very cost-efficient, robust and very precise localization approach based on GraphSLAM and graph optimization using radar sensors. We are able to prove on a dynamically changing parking lot layout that both mapping and localization accuracy are very high. To evaluate the performance of the mapping algorithm, a highly accurate ground truth map generated from a total station was used. Localization results are compared to a high precision DGPS/INS system. Utilizing these methods, we can show the strong performance of our algorithm.
Reliable and accurate car driver head pose estimation is an important function for the next generation of advanced driver assistance systems that need to consider the driver state in their analysis. For optimal performance, head pose estimation needs to be non-invasive, calibration-free and accurate for varying driving and illumination conditions. In this pilot study we investigate a 3D head pose estimation system that automatically fits a statistical 3D face model to measurements of a driver’s face, acquired with a low-cost depth sensor on challenging real-world data. We evaluate the results of our sensor-independent, driver-adaptive approach to those of a state-of-the-art camera-based 2D face tracking system as well as a non-adaptive 3D model relative to own ground-truth data, and compare to other 3D benchmarks. We find large accuracy benefits of the adaptive 3D approach.
Significant advances have been achieved in mobile robot localization and mapping in dynamic environments, however these are mostly incapable of dealing with the physical properties of automotive radar sensors. In this paper we present an accurate and robust solution to this problem, by introducing a memory efficient cluster map representation. Our approach is validated by experiments that took place on a public parking space with pedestrians, moving cars, as well as different parking configurations to provide a challenging dynamic environment. The results prove its ability to reproducibly localize our vehicle within an error margin of below 1% with respect to ground truth using only point based radar targets. A decay process enables our map representation to support local updates.
In any autonomous driving system, the map for localization plays a vital part that is often underestimated. The map describes the world around the vehicle outside of the sensor view and is a main input into the decision making process in highly complicated scenarios. Thus there are strict requirements towards the accuracy and timeliness of the map. We present a robust and reliable approach towards crowd based mapping using a GraphSLAM framework based on radar sensors. We show on a parking lot that even in dynamically changing environments, the localization results are very accurate and reliable even in unexplored terrain without any map data. This can be achieved by collaborative map updates from multiple vehicles. To show these claims experimentally, the Joint Graph Optimization is compared to the ground truth on an industrial parking space. Mapping performance is evaluated using a dense map from a total station as reference and localization results are compared with a deeply coupled DGPS/INS system.
Recognizing actions of humans, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Today’s technical perception solutions have been developed and tested mostly on standard vision benchmark datasets where manual labeling of sensory ground truth is a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in such data leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework which offers to train and validate algorithms on various environmental conditions. For this paper we created a dataset, containing rare human activities in urban areas, on which a current state of the art algorithm for pose estimation fails and demonstrate how to train such rare poses with simulated data only.
As production workspaces become more mobile and dynamic it becomes increasingly important to reliably monitor the overall state of the environment. Therein manipulators or other robotic systems likely have to be able to act autonomously together with humans and other systems within a joint workspace. Such interactions require that all components in non-stationary environments are able to perceive the state relative to each other. As vision-sensors provide a rich source of information to accomplish this, we present RoPose, a convolutional neural network (CNN) based approach, to estimate the two dimensional joint configuration of a simulated industrial manipulator from a camera image. This pose information can further be used by a novel targetless calibration setup to estimate the pose of the camera relative to the manipulator’s space. We present a pipeline to automatically generate synthetic training data and conclude with a discussion of the potential usage of the same pipeline to acquire real image datasets of physically existent robots.
Learning to translate between real world and simulated 3D sensors while transferring task models
(2019)
Learning-based vision tasks are usually specialized on the sensor technology for which data has been labeled. The knowledge of a learned model is simply useless when it comes to data which differs from the data on which the model has been initially trained or if the model should be applied to a totally different imaging or sensor source. New labeled data has to be acquired on which a new model can be trained. Depending on the sensor, this can even get more complicated when the sensor data becomes more abstract and hard to be interpreted and labeled by humans. To enable reuse of models trained for a specific task across different sensors minimizes the data acquisition effort. Therefore, this work focuses on learning sensor models and translating between them, thus aiming for sensor interoperability. We show that even for the complex task of human pose estimation from 3D depth data recorded with different sensors, i.e. a simulated and a Kinect 2TM depth sensor, human pose estimation can greatly improve by translating between sensor models without modifying the original task model. This process especially benefits sensors and applications for which labels and models are difficult if at all possible to retrieve from raw sensor data.
RoPose-Real: real world dataset acquisition for data-driven industrial robot arm pose estimation
(2019)
It is necessary to employ smart sensory systems in dynamic and mobile workspaces where industrial robots are mounted on mobile platforms. Such systems should be aware of flexible and non-stationary workspaces and able to react autonomously to changing situations. Building upon our previously presented RoPose-system, which employs a convolutional neural network architecture that has been trained on pure synthetic data to estimate the kinematic chain of an industrial robot arm system, we now present RoPose-Real. RoPose-Real extends the prior system with a comfortable and targetless extrinsic calibration tool, to allow for the production of automatically annotated datasets for real robot systems. Furthermore, we use the novel datasets to train the estimation network with real world data. The extracted pose information is used to automatically estimate the observing sensor pose relative to the robot system. Finally we evaluate the performance of the presented subsystems in a real world robotic scenario.
Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. To train the corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrate a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we transform noisy human pose estimates in an image like format we call Encoded Human Pose Image (EHPI). This encoded information can further be classified using standard methods from the computer vision community. With this simple procedure, we achieve competitive state-of-the-art performance in pose based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.
We present a multitask network that supports various deep neural network based pedestrian detection functions. Besides 2D and 3D human pose, it also supports body and head orientation estimation based on full body bounding box input. This eliminates the need for explicit face recognition. We show that the performance of 3D human pose estimation and orientation estimation is comparable to the state-of-the-art. Since very few data sets exist for 3D human pose and in particular body and head orientation estimation based on full body data, we further show the benefit of particular simulation data to train the network. The network architecture is relatively simple, yet powerful, and easily adaptable for further research and applications.
Human pose estimation (HPE) is integral to scene understanding in numerous safety-critical domains involving human-machine interaction, such as autonomous driving or semi-automated work environments. Avoiding costly mistakes is synonymous with anticipating failure in model predictions, which necessitates meta-judgments on the accuracy of the applied models. Here, we propose a straightforward human pose regression framework to examine the behavior of two established methods for simultaneous aleatoric and epistemic uncertainty estimation: maximum a-posteriori (MAP) estimation with Monte-Carlo variational inference and deep evidential regression (DER). First, we evaluate both approaches on the quality of their predicted variances and whether these truly capture the expected model error. The initial assessment indicates that both methods exhibit the overconfidence issue common in deep probabilistic models. This observation motivates our implementation of an additional recalibration step to extract reliable confidence intervals. We then take a closer look at deep evidential regression, which, to our knowledge, is applied comprehensively for the first time to the HPE problem. Experimental results indicate that DER behaves as expected in challenging and adverse conditions commonly occurring in HPE and that the predicted uncertainties match their purported aleatoric and epistemic sources. Notably, DER achieves smooth uncertainty estimates without the need for a costly sampling step, making it an attractive candidate for uncertainty estimation on resource-limited platforms.