Informatik
Refine
Document Type
- Journal article (7) (remove)
Language
- English (7)
Has full text
- yes (7)
Is part of the Bibliography
- yes (7)
Institute
- Informatik (7)
Publisher
- De Gruyter (2)
- ARVO (1)
- Inst. of Electrical and Electronics Engineers (1)
- PLOS (1)
- Routledge, Taylor & Francis Group (1)
- Sage (1)
Prominent theories of action recognition suggest that during the recognition of actions the physical patterns of the action is associated with only one action interpretation (e.g., a person waving his arm is recognized as waving). In contrast to this view, studies examining the visual categorization of objects show that objects are recognized in multiple ways (e.g., a VW Beetle can be recognized as a car or a beetle) and that categorization performance is based on the visual and motor movement similarity between objects. Here, we studied whether we find evidence for multiple levels of categorization for social interactions (physical interactions with another person, e.g., handshakes). To do so, we compared visual categorization of objects and social interactions (Experiments 1 and 2) in a grouping task and assessed the usefulness of motor and visual cues (Experiments 3, 4, and 5) for object and social interaction categorization. Additionally, we measured recognition performance associated with recognizing objects and social interactions at different categorization levels (Experiment 6). We found that basic level object categories were associated with a clear recognition advantage compared to subordinate recognition but basic level social interaction categories provided only a little recognition advantage. Moreover, basic level object categories were more strongly associated with similar visual and motor cues than basic level social interaction categories. The results suggest that cognitive categories underlying the recognition of objects and social interactions are associated with different performances. These results are in line with the idea that the same action can be associated with several action interpretations (e.g., a person waving his arm can be recognized as waving or greeting).
Motor-based theories of facial expression recognition propose that the visual perception of facial expression is aided by sensorimotor processes that are also used for the production of the same expression. Accordingly, sensorimotor and visual processes should provide congruent emotional information about a facial expression. Here, we report evidence that challenges this view. Specifically, the repeated execution of facial expressions has the opposite effect on the recognition of a subsequent facial expression than the repeated viewing of facial expressions. Moreover, the findings of the motor condition, but not of the visual condition, were correlated with a nonsensory condition in which participants imagined an emotional situation. These results can be well accounted for by the idea that facial expression recognition is not always mediated by motor processes but can also be recognized on visual information alone.
With the progress of technology in modern hospitals, an intelligent perioperative situation recognition will gain more relevance due to its potential to substantially improve surgical workflows by providing situation knowledge in real-time. Such knowledge can be extracted from image data by machine learning techniques but poses a privacy threat to the staff’s and patients’ personal data. De-identification is a possible solution for removing visual sensitive information. In this work, we developed a YOLO v3 based prototype to detect sensitive areas in the image in real-time. These are then deidentified using common image obfuscation techniques. Our approach shows that it is principle suitable for de-identifying sensitive data in OR images and contributes to a privacyrespectful way of processing in the context of situation recognition in the OR.
Putting actions in context: visual action adaptation aftereffects are modulated by social contexts
(2014)
The social context in which an action is embedded provides important information for the interpretation of an action. Is this social context integrated during the visual recognition of an action? We used a behavioural visual adaptation paradigm to address this question and measured participants’ perceptual bias of a test action after they were adapted to one of two adaptors (adaptation after-effect). The action adaptation after effect was measured for the same set of adaptors in two different social contexts. Our results indicate that the size of the adaptation effect varied with social context (social context modulation) although the physical appearance of the adaptors remained unchanged. Three additional experiments provided evidence that the observed social context modulation of the adaptation effect are owed to the adaptation of visual action recognition processes. We found that adaptation is critical for the social context modulation (experiment 2). Moreover, the effect is not mediated by emotional content of the action alone (experiment 3) and visual information about the action seems to be critical for the emergence of action adaptation effects (experiment 4). Taken together these results suggest that processes underlying visual action recognition are sensitive to the social context of an action.
Perceptual integration of kinematic components in the recognition of emotional facial expressions
(2018)
According to a long-standing hypothesis in motor control, complex body motion is organized in terms of movement primitives, reducing massively the dimensionality of the underlying control problems. For body movements, this low dimensional organization has been convincingly demonstrated by the learning of low-dimensional representations from kinematic and EMG data. In contrast, the effective dimensionality of dynamic facial expressions is unknown, and dominant analysis approaches have been based on heuristically defined facial ‘‘action units,’’ which reflect contributions of individual face muscles. We determined the effective dimensionality of dynamic facial expressions by learning of a low dimensional model from 11 facial expressions. We found an amazingly low dimensionality with only two movement primitives being sufficient to simulate these dynamic expressions with high accuracy. This low dimensionality is confirmed statistically, by Bayesian model comparison of models with different numbers of primitives, and by a psychophysical experiment that demonstrates that expressions, simulated with only two primitives, are indistinguishable from natural ones.
In addition, we find statistically optimal integration of the emotion information specified by these primitives in visual perception. Taken together, our results indicate that facial expressions might be controlled by a very small number of independent control units, permitting very low dimensional parametrization of the associated facial expression.
Enhancing data-driven algorithms for human pose estimation and action recognition through simulation
(2020)
Recognizing human actions, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Intelligent transport systems in particular face this challenge, as interactions with people are often required. The development and testing of technical perception solutions is done mostly on standard vision benchmark datasets for which manual labelling of sensory ground truth has been a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in these datasets, leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework, which offers to train and validate algorithms on various human-centred scenarios. We describe the usage of simulation data to train a state-of-the-art human pose estimation algorithm to recognize unusual human activities in urban areas. Since the recognition of human actions can be an important component of intelligent transport systems, we investigated how simulations can be applied for his purpose. Laboratory experiments show that we can train a recurrent neural network with only simulated data based on motion capture data and 3D avatars, which achieves an almost perfect performance in the classification of those human actions on real data.
We present an approach for segmenting individual cells and lamellipodia in epithelial cell clusters using fully convolutional neural networks. The method will set the basis for measuring cell cluster dynamics and expansion to improve the investigation of collective cell migration phenomena. The fully learning-based front-end avoids classical feature engineering, yet the network architecture needs to be designed carefully. Our network predicts how likely each pixel belongs to one of the classes and, thus, is able to segment the image. Besides characterizing segmentation performance, we discuss how the network will be further employed.