Refine
Document Type
Is part of the Bibliography
- yes (10)
Institute
- Informatik (10)
Publisher
- IEEE (6)
- Springer (2)
- Universität Tübingen (1)
Learning to translate between real world and simulated 3D sensors while transferring task models
(2019)
Learning-based vision tasks are usually specialized on the sensor technology for which data has been labeled. The knowledge of a learned model is simply useless when it comes to data which differs from the data on which the model has been initially trained or if the model should be applied to a totally different imaging or sensor source. New labeled data has to be acquired on which a new model can be trained. Depending on the sensor, this can even get more complicated when the sensor data becomes more abstract and hard to be interpreted and labeled by humans. To enable reuse of models trained for a specific task across different sensors minimizes the data acquisition effort. Therefore, this work focuses on learning sensor models and translating between them, thus aiming for sensor interoperability. We show that even for the complex task of human pose estimation from 3D depth data recorded with different sensors, i.e. a simulated and a Kinect 2TM depth sensor, human pose estimation can greatly improve by translating between sensor models without modifying the original task model. This process especially benefits sensors and applications for which labels and models are difficult if at all possible to retrieve from raw sensor data.
RoPose-Real: real world dataset acquisition for data-driven industrial robot arm pose estimation
(2019)
It is necessary to employ smart sensory systems in dynamic and mobile workspaces where industrial robots are mounted on mobile platforms. Such systems should be aware of flexible and non-stationary workspaces and able to react autonomously to changing situations. Building upon our previously presented RoPose-system, which employs a convolutional neural network architecture that has been trained on pure synthetic data to estimate the kinematic chain of an industrial robot arm system, we now present RoPose-Real. RoPose-Real extends the prior system with a comfortable and targetless extrinsic calibration tool, to allow for the production of automatically annotated datasets for real robot systems. Furthermore, we use the novel datasets to train the estimation network with real world data. The extracted pose information is used to automatically estimate the observing sensor pose relative to the robot system. Finally we evaluate the performance of the presented subsystems in a real world robotic scenario.
Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. To train the corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrate a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we transform noisy human pose estimates in an image like format we call Encoded Human Pose Image (EHPI). This encoded information can further be classified using standard methods from the computer vision community. With this simple procedure, we achieve competitive state-of-the-art performance in pose based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.
Enhancing data-driven algorithms for human pose estimation and action recognition through simulation
(2020)
Recognizing human actions, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Intelligent transport systems in particular face this challenge, as interactions with people are often required. The development and testing of technical perception solutions is done mostly on standard vision benchmark datasets for which manual labelling of sensory ground truth has been a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in these datasets, leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework, which offers to train and validate algorithms on various human-centred scenarios. We describe the usage of simulation data to train a state-of-the-art human pose estimation algorithm to recognize unusual human activities in urban areas. Since the recognition of human actions can be an important component of intelligent transport systems, we investigated how simulations can be applied for his purpose. Laboratory experiments show that we can train a recurrent neural network with only simulated data based on motion capture data and 3D avatars, which achieves an almost perfect performance in the classification of those human actions on real data.
Die Erfindung betrifft ein Verfahren zur extrinsischen Kalibrierung wenigstens eines bildgebenden Sensors, wonach eine Pose des wenigstens einen bildgebenden Sensors relativ zu dem Ursprung (U) eines dreidimensionalen Koordinatensystems einer Handhabungseinrichtung mittels einer Recheneinrichtung bestimmt wird, wobei bekannte dreidimensionale Koordinaten betreffend die Position wenigstens eines Gelenks der Handhabungseinrichtung durch die Recheneinrichtung berücksichtigt werden, und wobei zweidimensionale Koordinaten betreffend die Position des wenigstens einen Gelenks anhand von Rohdaten des wenigstens einen bildgebenden Sensors ermittelt werden, und wobei die Recheneinrichtung die Pose des wenigstens einen bildgebenden Sensors anhand der Korrespondenz zwischen den zweidimensionalen Koordinaten und den dreidimensionalen Koordinaten bestimmt.
Recognizing actions of humans, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Today’s technical perception solutions have been developed and tested mostly on standard vision benchmark datasets where manual labeling of sensory ground truth is a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in such data leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework which offers to train and validate algorithms on various environmental conditions. For this paper we created a dataset, containing rare human activities in urban areas, on which a current state of the art algorithm for pose estimation fails and demonstrate how to train such rare poses with simulated data only.
The basic idea behind a wearable robotic grasp assistancesystem is to support people that suffer from severe motor impairments in daily activities. Such a system needs to act mostly autonomously and according to the user’s intent. Vision-based hand pose estimation could be an integral part of a larger control and assistance framework. In this paper we evaluate the performance of egocentric monocular hand pose estimation for a robot-controlled hand exoskeleton in a simulation. For hand pose estimation we adopt a Convolutional Neural Network (CNN). We train and evaluate this network with computer graphics, created by our own data generator. In order to guide further design decisions we focus in our experiments on two egocentric camera viewpoints tested on synthetic data with the help of a 3D-scanned hand model, with and without an exoskeleton attached to it.We observe that hand pose estimation with a wrist-mounted camera performs more accurate than with a head-mounted camera in the context of our simulation. Further, a grasp assistance system attached to the hand alters visual appearance and can improve hand pose estimation. Our experiment provides useful insights for the integration of sensors into a context sensitive analysis framework for intelligent assistance.
As production workspaces become more mobile and dynamic it becomes increasingly important to reliably monitor the overall state of the environment. Therein manipulators or other robotic systems likely have to be able to act autonomously together with humans and other systems within a joint workspace. Such interactions require that all components in non-stationary environments are able to perceive the state relative to each other. As vision-sensors provide a rich source of information to accomplish this, we present RoPose, a convolutional neural network (CNN) based approach, to estimate the two dimensional joint configuration of a simulated industrial manipulator from a camera image. This pose information can further be used by a novel targetless calibration setup to estimate the pose of the camera relative to the manipulator’s space. We present a pipeline to automatically generate synthetic training data and conclude with a discussion of the potential usage of the same pipeline to acquire real image datasets of physically existent robots.
Based on well-established robotic concepts of autonomous localization and navigation we present a system prototype to assist camera-based indoor navigation for human utilization implemented in the Robot Operating System (ROS). Our prototype takes advantage of state-of-the-art computer vision and robotic methods. Our system is designed for assistive indoor guidance. We employ a vibro tactile belt to serve as a guiding device to render derived motion suggestions to the user via vibration patterns. We evaluated the effectiveness of a variety of vibro-tactile feedback patterns for guidance of blindfolded users. Our prototype demonstrates that a vision-based system can support human navigation, and may also assist the visually impaired in a human-centered way.
In modern collaborative production environments where industrial robots and humans are supposed to work hand in hand, it is mandatory to observe the robot’s workspace at all times. Such observation is even more crucial when the robot’s main position is also dynamic e.g. because the system is mounted on a movable platform. As current solutions like physically secured areas in which a robot can perform actions potentially dangerous for humans, become unfeasible in such scenarios, novel, more dynamic, and situation aware safety solutions need to be developed and deployed.
This thesis mainly contributes to the bigger picture of such a collaborative scenario by presenting a data-driven convolutional neural network-based approach to estimate the two-dimensional kinematic-chain configuration of industrial robot-arms within raw camera images. This thesis also provides the information needed to generate and organize the mandatory data basis and presents frameworks that were used to realize all involved subsystems. The robot-arm’s extracted kinematic-chain can also be used to estimate the extrinsic camera parameters relative to the robot’s three-dimensional origin. Further a tracking system, based on a two-dimensional kinematic chain descriptor is presented to allow for an accumulation of a proper movement history which enables the prediction of future target positions within the given image plane. The combination of the extracted robot’s pose with a simultaneous human pose estimation system delivers a consistent data flow that can be used in higher-level applications.
This thesis also provides a detailed evaluation of all involved subsystems and provides a broad overview of their particular performance, based on novel generated, semi automatically annotated, real datasets.