Refine
Document Type
- Conference proceeding (19)
- Journal article (9)
Has full text
- yes (28)
Is part of the Bibliography
- yes (28)
Institute
- Informatik (24)
- Technik (3)
- Life Sciences (1)
Publisher
- IEEE (11)
- De Gruyter (3)
- ACM (2)
- Springer (2)
- ARVO (1)
- Deutsche Gesellschaft für Computer- und Roboterassistierte Chirurgie e.V. (1)
- Inst. of Electrical and Electronics Engineers (1)
- MDPI (1)
- PLOS (1)
- Routledge, Taylor & Francis Group (1)
There is still a great reliance on human expert knowledge during the analog integrated circuit sizing design phase due to its complexity and scale, with the result that there is a very low level of automation associated with it. Current research shows that reinforcement learning is a promising approach for addressing this issue. Similarly, it has been shown that the convergence of conventional optimization approaches can be improved by transforming the design space from the geometrical domain into the electrical domain. Here, this design space transformation is employed as an alternative action space for deep reinforcement learning agents. The presented approach is based entirely on reinforcement learning, whereby agents are trained in the craft of analog circuit sizing without explicit expert guidance. After training and evaluating agents on circuits of varying complexity, their behavior when confronted with a different technology, is examined, showing the applicability, feasibility as well as transferability of this approach.
Prominent theories of action recognition suggest that during the recognition of actions the physical patterns of the action is associated with only one action interpretation (e.g., a person waving his arm is recognized as waving). In contrast to this view, studies examining the visual categorization of objects show that objects are recognized in multiple ways (e.g., a VW Beetle can be recognized as a car or a beetle) and that categorization performance is based on the visual and motor movement similarity between objects. Here, we studied whether we find evidence for multiple levels of categorization for social interactions (physical interactions with another person, e.g., handshakes). To do so, we compared visual categorization of objects and social interactions (Experiments 1 and 2) in a grouping task and assessed the usefulness of motor and visual cues (Experiments 3, 4, and 5) for object and social interaction categorization. Additionally, we measured recognition performance associated with recognizing objects and social interactions at different categorization levels (Experiment 6). We found that basic level object categories were associated with a clear recognition advantage compared to subordinate recognition but basic level social interaction categories provided only a little recognition advantage. Moreover, basic level object categories were more strongly associated with similar visual and motor cues than basic level social interaction categories. The results suggest that cognitive categories underlying the recognition of objects and social interactions are associated with different performances. These results are in line with the idea that the same action can be associated with several action interpretations (e.g., a person waving his arm can be recognized as waving or greeting).
Motor-based theories of facial expression recognition propose that the visual perception of facial expression is aided by sensorimotor processes that are also used for the production of the same expression. Accordingly, sensorimotor and visual processes should provide congruent emotional information about a facial expression. Here, we report evidence that challenges this view. Specifically, the repeated execution of facial expressions has the opposite effect on the recognition of a subsequent facial expression than the repeated viewing of facial expressions. Moreover, the findings of the motor condition, but not of the visual condition, were correlated with a nonsensory condition in which participants imagined an emotional situation. These results can be well accounted for by the idea that facial expression recognition is not always mediated by motor processes but can also be recognized on visual information alone.
In any autonomous driving system, the map for localization plays a vital part that is often underestimated. The map describes the world around the vehicle outside of the sensor view and is a main input into the decision making process in highly complicated scenarios. Thus there are strict requirements towards the accuracy and timeliness of the map. We present a robust and reliable approach towards crowd based mapping using a GraphSLAM framework based on radar sensors. We show on a parking lot that even in dynamically changing environments, the localization results are very accurate and reliable even in unexplored terrain without any map data. This can be achieved by collaborative map updates from multiple vehicles. To show these claims experimentally, the Joint Graph Optimization is compared to the ground truth on an industrial parking space. Mapping performance is evaluated using a dense map from a total station as reference and localization results are compared with a deeply coupled DGPS/INS system.
Perceptual integration of kinematic components in the recognition of emotional facial expressions
(2018)
According to a long-standing hypothesis in motor control, complex body motion is organized in terms of movement primitives, reducing massively the dimensionality of the underlying control problems. For body movements, this low dimensional organization has been convincingly demonstrated by the learning of low-dimensional representations from kinematic and EMG data. In contrast, the effective dimensionality of dynamic facial expressions is unknown, and dominant analysis approaches have been based on heuristically defined facial ‘‘action units,’’ which reflect contributions of individual face muscles. We determined the effective dimensionality of dynamic facial expressions by learning of a low dimensional model from 11 facial expressions. We found an amazingly low dimensionality with only two movement primitives being sufficient to simulate these dynamic expressions with high accuracy. This low dimensionality is confirmed statistically, by Bayesian model comparison of models with different numbers of primitives, and by a psychophysical experiment that demonstrates that expressions, simulated with only two primitives, are indistinguishable from natural ones.
In addition, we find statistically optimal integration of the emotion information specified by these primitives in visual perception. Taken together, our results indicate that facial expressions might be controlled by a very small number of independent control units, permitting very low dimensional parametrization of the associated facial expression.
Recognizing actions of humans, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Today’s technical perception solutions have been developed and tested mostly on standard vision benchmark datasets where manual labeling of sensory ground truth is a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in such data leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework which offers to train and validate algorithms on various environmental conditions. For this paper we created a dataset, containing rare human activities in urban areas, on which a current state of the art algorithm for pose estimation fails and demonstrate how to train such rare poses with simulated data only.
The basic idea behind a wearable robotic grasp assistancesystem is to support people that suffer from severe motor impairments in daily activities. Such a system needs to act mostly autonomously and according to the user’s intent. Vision-based hand pose estimation could be an integral part of a larger control and assistance framework. In this paper we evaluate the performance of egocentric monocular hand pose estimation for a robot-controlled hand exoskeleton in a simulation. For hand pose estimation we adopt a Convolutional Neural Network (CNN). We train and evaluate this network with computer graphics, created by our own data generator. In order to guide further design decisions we focus in our experiments on two egocentric camera viewpoints tested on synthetic data with the help of a 3D-scanned hand model, with and without an exoskeleton attached to it.We observe that hand pose estimation with a wrist-mounted camera performs more accurate than with a head-mounted camera in the context of our simulation. Further, a grasp assistance system attached to the hand alters visual appearance and can improve hand pose estimation. Our experiment provides useful insights for the integration of sensors into a context sensitive analysis framework for intelligent assistance.
As production workspaces become more mobile and dynamic it becomes increasingly important to reliably monitor the overall state of the environment. Therein manipulators or other robotic systems likely have to be able to act autonomously together with humans and other systems within a joint workspace. Such interactions require that all components in non-stationary environments are able to perceive the state relative to each other. As vision-sensors provide a rich source of information to accomplish this, we present RoPose, a convolutional neural network (CNN) based approach, to estimate the two dimensional joint configuration of a simulated industrial manipulator from a camera image. This pose information can further be used by a novel targetless calibration setup to estimate the pose of the camera relative to the manipulator’s space. We present a pipeline to automatically generate synthetic training data and conclude with a discussion of the potential usage of the same pipeline to acquire real image datasets of physically existent robots.
Significant advances have been achieved in mobile robot localization and mapping in dynamic environments, however these are mostly incapable of dealing with the physical properties of automotive radar sensors. In this paper we present an accurate and robust solution to this problem, by introducing a memory efficient cluster map representation. Our approach is validated by experiments that took place on a public parking space with pedestrians, moving cars, as well as different parking configurations to provide a challenging dynamic environment. The results prove its ability to reproducibly localize our vehicle within an error margin of below 1% with respect to ground truth using only point based radar targets. A decay process enables our map representation to support local updates.
Putting actions in context: visual action adaptation aftereffects are modulated by social contexts
(2014)
The social context in which an action is embedded provides important information for the interpretation of an action. Is this social context integrated during the visual recognition of an action? We used a behavioural visual adaptation paradigm to address this question and measured participants’ perceptual bias of a test action after they were adapted to one of two adaptors (adaptation after-effect). The action adaptation after effect was measured for the same set of adaptors in two different social contexts. Our results indicate that the size of the adaptation effect varied with social context (social context modulation) although the physical appearance of the adaptors remained unchanged. Three additional experiments provided evidence that the observed social context modulation of the adaptation effect are owed to the adaptation of visual action recognition processes. We found that adaptation is critical for the social context modulation (experiment 2). Moreover, the effect is not mediated by emotional content of the action alone (experiment 3) and visual information about the action seems to be critical for the emergence of action adaptation effects (experiment 4). Taken together these results suggest that processes underlying visual action recognition are sensitive to the social context of an action.