Refine
Document Type
- Conference proceeding (19) (remove)
Has full text
- yes (19)
Is part of the Bibliography
- yes (19)
Institute
- Informatik (17)
- Technik (2)
Publisher
Human pose estimation (HPE) is integral to scene understanding in numerous safety-critical domains involving human-machine interaction, such as autonomous driving or semi-automated work environments. Avoiding costly mistakes is synonymous with anticipating failure in model predictions, which necessitates meta-judgments on the accuracy of the applied models. Here, we propose a straightforward human pose regression framework to examine the behavior of two established methods for simultaneous aleatoric and epistemic uncertainty estimation: maximum a-posteriori (MAP) estimation with Monte-Carlo variational inference and deep evidential regression (DER). First, we evaluate both approaches on the quality of their predicted variances and whether these truly capture the expected model error. The initial assessment indicates that both methods exhibit the overconfidence issue common in deep probabilistic models. This observation motivates our implementation of an additional recalibration step to extract reliable confidence intervals. We then take a closer look at deep evidential regression, which, to our knowledge, is applied comprehensively for the first time to the HPE problem. Experimental results indicate that DER behaves as expected in challenging and adverse conditions commonly occurring in HPE and that the predicted uncertainties match their purported aleatoric and epistemic sources. Notably, DER achieves smooth uncertainty estimates without the need for a costly sampling step, making it an attractive candidate for uncertainty estimation on resource-limited platforms.
We present a multitask network that supports various deep neural network based pedestrian detection functions. Besides 2D and 3D human pose, it also supports body and head orientation estimation based on full body bounding box input. This eliminates the need for explicit face recognition. We show that the performance of 3D human pose estimation and orientation estimation is comparable to the state-of-the-art. Since very few data sets exist for 3D human pose and in particular body and head orientation estimation based on full body data, we further show the benefit of particular simulation data to train the network. The network architecture is relatively simple, yet powerful, and easily adaptable for further research and applications.
There is still a great reliance on human expert knowledge during the analog integrated circuit sizing design phase due to its complexity and scale, with the result that there is a very low level of automation associated with it. Current research shows that reinforcement learning is a promising approach for addressing this issue. Similarly, it has been shown that the convergence of conventional optimization approaches can be improved by transforming the design space from the geometrical domain into the electrical domain. Here, this design space transformation is employed as an alternative action space for deep reinforcement learning agents. The presented approach is based entirely on reinforcement learning, whereby agents are trained in the craft of analog circuit sizing without explicit expert guidance. After training and evaluating agents on circuits of varying complexity, their behavior when confronted with a different technology, is examined, showing the applicability, feasibility as well as transferability of this approach.
Avatars are in use when interacting in virtual environments in different contexts, in collaborative work, as well as in gaming and also in virtual meetings with friends. Therefore it is important to understand how the relationship between user and avatar works. In this study, an online survey is used to determine how the perception of an avatar changes in different contexts by relating it to existing avatar relationship typologies. Additionally, it is determined whether in each context a realistic, abstract or comic-like representation is preferred by the participants. One result was a preference of low poly representations in the work context, which are associated with the perception of the avatar as a tool. In the context of meeting friends, a realistic representation is perceived as more appropriate, which is perceived as an accurate self-representation. In the gaming context, the results are less clear, which can be attributed to different gaming preferences. Here, unlike in the other contexts, a comic-like representation is also perceived as appropriate, which is associated with the perception of the avatar as a friend. A symbiotic user-avatar relationship is not directly related to any form of representation, but always lies in the midfield, which is attributed to the fact that it represents a whole spectrum between other categories.
This paper presents a machine learning powered, procedural sizing methodology based on pre-computed look-up tables containing operating point characteristics of primitive devices. Several Neural Networks are trained for 90nm and 45nm technologies, mapping different electrical parameters to the corresponding dimensions of a primitive device. This transforms the geometric sizing problem into the domain of circuit design experts, where the desired electrical characteristics are now inputs to the model. Analog building blocks or entire circuits are expressed as a sequence of model evaluations, capturing the sizing strategy and intention of the designer in a procedure, which is reusable across different technology nodes. The methodology is employed for the sizing of two operational amplifiers, and evaluated for two technology nodes, showing the versatility and efficiency of this approach.
Facial expressions play a dominant role in facilitating social interactions. We endeavor to develop tactile displays to reinstate facial expression modulated communication. The high spatial and temporal dimensionality of facial movements poses a unique challenge when designing tactile encodings of them. A further challenge is developing encodings that are at-tuned to the perceptual characteristics of our skin. A caveat of using vibrotactile displays is that tactile stimuli have been shown to induce perceptual tactile aftereffects when used on the fingers, arm and face. However, at present, despite the prevalence of waist-worn tactile displays, no such investigations of tactile aftereffects at the waist region exist in the literature, though they are warranted by the unique sensory and perceptual signalling characteristics of this area. Using an adaptation paradigm we investigated the presence of perceptual tactile aftereffects induced by continuous and burst vibrotactile stimuli delivered at the navel, side and spinal regions of the waist. We report evidence that the tactile perception topology of the waist is non-uniform, and specifically that the navel and spine regions are resistant to adaptive aftereffects while side regions are more prone to perceptual adaptations to continuous but not burst stimulations. Results of our current investigations highlight the unique set of challenges posed by designing waist-worn tactile displays. These and future perceptual studies can directly inform more realistic and effective implementations of complex high-dimensional spatiotemporal social cues.
Learning to translate between real world and simulated 3D sensors while transferring task models
(2019)
Learning-based vision tasks are usually specialized on the sensor technology for which data has been labeled. The knowledge of a learned model is simply useless when it comes to data which differs from the data on which the model has been initially trained or if the model should be applied to a totally different imaging or sensor source. New labeled data has to be acquired on which a new model can be trained. Depending on the sensor, this can even get more complicated when the sensor data becomes more abstract and hard to be interpreted and labeled by humans. To enable reuse of models trained for a specific task across different sensors minimizes the data acquisition effort. Therefore, this work focuses on learning sensor models and translating between them, thus aiming for sensor interoperability. We show that even for the complex task of human pose estimation from 3D depth data recorded with different sensors, i.e. a simulated and a Kinect 2TM depth sensor, human pose estimation can greatly improve by translating between sensor models without modifying the original task model. This process especially benefits sensors and applications for which labels and models are difficult if at all possible to retrieve from raw sensor data.
RoPose-Real: real world dataset acquisition for data-driven industrial robot arm pose estimation
(2019)
It is necessary to employ smart sensory systems in dynamic and mobile workspaces where industrial robots are mounted on mobile platforms. Such systems should be aware of flexible and non-stationary workspaces and able to react autonomously to changing situations. Building upon our previously presented RoPose-system, which employs a convolutional neural network architecture that has been trained on pure synthetic data to estimate the kinematic chain of an industrial robot arm system, we now present RoPose-Real. RoPose-Real extends the prior system with a comfortable and targetless extrinsic calibration tool, to allow for the production of automatically annotated datasets for real robot systems. Furthermore, we use the novel datasets to train the estimation network with real world data. The extracted pose information is used to automatically estimate the observing sensor pose relative to the robot system. Finally we evaluate the performance of the presented subsystems in a real world robotic scenario.
Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. To train the corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrate a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we transform noisy human pose estimates in an image like format we call Encoded Human Pose Image (EHPI). This encoded information can further be classified using standard methods from the computer vision community. With this simple procedure, we achieve competitive state-of-the-art performance in pose based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.
Die Segmentierung und das Tracking von minimal-invasiven robotergeführten Instrumenten ist ein wesentlicher Bestandteil für verschiedene computer assistierte Eingriffe. Allerdings treten in der minimal-invasiven Chirurgie, die das Anwendungsfeld für den hier beschriebenen Ansatz darstellt, häufig Schwierigkeiten durch Reflexionen, Schatten oder visuelle Verdeckungen durch Rauch und Organe auf und erschweren die Segmentierung und das Tracking der Instrumente.
Dieser Beitrag stellt einen Deep Learning Ansatz für ein markerloses Tracking von minimal-invasiven Instrumenten vor und wird sowohl auf simulierten als auch realen Daten getestet. Es wird ein simulierter als auch realer Datensatz mit Ground Truth Kennzeichnung für die binäre Segmentierung von Instrument und Hintergrund erstellt. Für den simulierten Datensatz werden Bilder aus einem simulierten Instrument und realem Hintergrund zusammengesetzt. Im Falle des realen Datensatzes spricht man von der Zusammensetzung der Bilder aus einem realen Instrument und Hintergrund. Insgesamt wird auf den simulierten Daten eine Pixelgenauigkeit von 94.70 Prozent und auf den realen Daten eine Pixelgenauigkeit von 87.30 Prozent erreicht.