Refine
Document Type
- Conference proceeding (20)
- Journal article (9)
Has full text
- yes (29)
Is part of the Bibliography
- yes (29)
Institute
- Informatik (25)
- Technik (3)
- Life Sciences (1)
Publisher
- IEEE (13)
- De Gruyter (3)
- Association for Computing Machinery (2)
- Springer (2)
- ARVO (1)
- Deutsche Gesellschaft für Computer- und Roboterassistierte Chirurgie e.V. (1)
- MDPI (1)
- PLOS (1)
- Sage (1)
- Taylor & Francis (1)
Prominent theories of action recognition suggest that during the recognition of actions the physical patterns of the action is associated with only one action interpretation (e.g., a person waving his arm is recognized as waving). In contrast to this view, studies examining the visual categorization of objects show that objects are recognized in multiple ways (e.g., a VW Beetle can be recognized as a car or a beetle) and that categorization performance is based on the visual and motor movement similarity between objects. Here, we studied whether we find evidence for multiple levels of categorization for social interactions (physical interactions with another person, e.g., handshakes). To do so, we compared visual categorization of objects and social interactions (Experiments 1 and 2) in a grouping task and assessed the usefulness of motor and visual cues (Experiments 3, 4, and 5) for object and social interaction categorization. Additionally, we measured recognition performance associated with recognizing objects and social interactions at different categorization levels (Experiment 6). We found that basic level object categories were associated with a clear recognition advantage compared to subordinate recognition but basic level social interaction categories provided only a little recognition advantage. Moreover, basic level object categories were more strongly associated with similar visual and motor cues than basic level social interaction categories. The results suggest that cognitive categories underlying the recognition of objects and social interactions are associated with different performances. These results are in line with the idea that the same action can be associated with several action interpretations (e.g., a person waving his arm can be recognized as waving or greeting).
Distraction of the driver is one of the most frequent causes for car accidents. We aim for a computational cognitive model predicting the driver’s degree of distraction during driving while performing a secondary task, such as talking with co-passengers. The secondary task might cognitively involve the driver to differing degrees depending on the topic of the conversation or the number of co-passengers. In order to detect these subtle differences in everyday driving situations, we aim to analyse in-car audio signals and combine this information with head pose and face tracking information. In the first step, we will assess driving, video and audio parameters reliably predicting cognitive distraction of the driver. These parameters will be used to train the cognitive model in estimating the degree of the driver’s distraction. In the second step, we will train and test the cognitive model during conversations of the driver with co-passengers during active driving. This paper describes the work in progress of our first experiment with preliminary results concerning driving parameters corresponding to the driver’s degree of distraction. In addition, the technical implementation of our experiment combining driving, video and audio data and first methodological results concerning the auditory analysis will be presented. The overall aim for the application of the cognitive distraction model is the development of a mobile user profile computing the individual distraction degree and being applicable also to other systems.
Motor-based theories of facial expression recognition propose that the visual perception of facial expression is aided by sensorimotor processes that are also used for the production of the same expression. Accordingly, sensorimotor and visual processes should provide congruent emotional information about a facial expression. Here, we report evidence that challenges this view. Specifically, the repeated execution of facial expressions has the opposite effect on the recognition of a subsequent facial expression than the repeated viewing of facial expressions. Moreover, the findings of the motor condition, but not of the visual condition, were correlated with a nonsensory condition in which participants imagined an emotional situation. These results can be well accounted for by the idea that facial expression recognition is not always mediated by motor processes but can also be recognized on visual information alone.
Digital light microscopy techniques are among the most widely used methods in cell biology and medical research. Despite that, the automated classification of objects such as cells or specific parts of tissues in images is difficult. We present an approach to classify confluent cell layers in microscopy images by learned deep correlation features using deep neural networks. These deep correlation features are generated through the use of gram-based correlation features and are input to a neural network for learning the correlation between them. In this work we wanted to prove if a representation of cell data based on this is suitable for its classification as has been done for artworks with respect to their artistic period. The method generates images that contain recognizable characteristics of a specific cell type, for example, the average size and the ordered pattern.
Die Segmentierung und das Tracking von minimal-invasiven robotergeführten Instrumenten ist ein wesentlicher Bestandteil für verschiedene computer assistierte Eingriffe. Allerdings treten in der minimal-invasiven Chirurgie, die das Anwendungsfeld für den hier beschriebenen Ansatz darstellt, häufig Schwierigkeiten durch Reflexionen, Schatten oder visuelle Verdeckungen durch Rauch und Organe auf und erschweren die Segmentierung und das Tracking der Instrumente.
Dieser Beitrag stellt einen Deep Learning Ansatz für ein markerloses Tracking von minimal-invasiven Instrumenten vor und wird sowohl auf simulierten als auch realen Daten getestet. Es wird ein simulierter als auch realer Datensatz mit Ground Truth Kennzeichnung für die binäre Segmentierung von Instrument und Hintergrund erstellt. Für den simulierten Datensatz werden Bilder aus einem simulierten Instrument und realem Hintergrund zusammengesetzt. Im Falle des realen Datensatzes spricht man von der Zusammensetzung der Bilder aus einem realen Instrument und Hintergrund. Insgesamt wird auf den simulierten Daten eine Pixelgenauigkeit von 94.70 Prozent und auf den realen Daten eine Pixelgenauigkeit von 87.30 Prozent erreicht.
We present an approach for segmenting individual cells and lamellipodia in epithelial cell clusters using fully convolutional neural networks. The method will set the basis for measuring cell cluster dynamics and expansion to improve the investigation of collective cell migration phenomena. The fully learning-based front-end avoids classical feature engineering, yet the network architecture needs to be designed carefully. Our network predicts how likely each pixel belongs to one of the classes and, thus, is able to segment the image. Besides characterizing segmentation performance, we discuss how the network will be further employed.
The basic idea behind a wearable robotic grasp assistancesystem is to support people that suffer from severe motor impairments in daily activities. Such a system needs to act mostly autonomously and according to the user’s intent. Vision-based hand pose estimation could be an integral part of a larger control and assistance framework. In this paper we evaluate the performance of egocentric monocular hand pose estimation for a robot-controlled hand exoskeleton in a simulation. For hand pose estimation we adopt a Convolutional Neural Network (CNN). We train and evaluate this network with computer graphics, created by our own data generator. In order to guide further design decisions we focus in our experiments on two egocentric camera viewpoints tested on synthetic data with the help of a 3D-scanned hand model, with and without an exoskeleton attached to it.We observe that hand pose estimation with a wrist-mounted camera performs more accurate than with a head-mounted camera in the context of our simulation. Further, a grasp assistance system attached to the hand alters visual appearance and can improve hand pose estimation. Our experiment provides useful insights for the integration of sensors into a context sensitive analysis framework for intelligent assistance.
As production workspaces become more mobile and dynamic it becomes increasingly important to reliably monitor the overall state of the environment. Therein manipulators or other robotic systems likely have to be able to act autonomously together with humans and other systems within a joint workspace. Such interactions require that all components in non-stationary environments are able to perceive the state relative to each other. As vision-sensors provide a rich source of information to accomplish this, we present RoPose, a convolutional neural network (CNN) based approach, to estimate the two dimensional joint configuration of a simulated industrial manipulator from a camera image. This pose information can further be used by a novel targetless calibration setup to estimate the pose of the camera relative to the manipulator’s space. We present a pipeline to automatically generate synthetic training data and conclude with a discussion of the potential usage of the same pipeline to acquire real image datasets of physically existent robots.
Recognizing actions of humans, reliably inferring their meaning and being able to potentially exchange mutual social information are core challenges for autonomous systems when they directly share the same space with humans. Today’s technical perception solutions have been developed and tested mostly on standard vision benchmark datasets where manual labeling of sensory ground truth is a tedious but necessary task. Furthermore, rarely occurring human activities are underrepresented in such data leading to algorithms not recognizing such activities. For this purpose, we introduce a modular simulation framework which offers to train and validate algorithms on various environmental conditions. For this paper we created a dataset, containing rare human activities in urban areas, on which a current state of the art algorithm for pose estimation fails and demonstrate how to train such rare poses with simulated data only.
In any autonomous driving system, the map for localization plays a vital part that is often underestimated. The map describes the world around the vehicle outside of the sensor view and is a main input into the decision making process in highly complicated scenarios. Thus there are strict requirements towards the accuracy and timeliness of the map. We present a robust and reliable approach towards crowd based mapping using a GraphSLAM framework based on radar sensors. We show on a parking lot that even in dynamically changing environments, the localization results are very accurate and reliable even in unexplored terrain without any map data. This can be achieved by collaborative map updates from multiple vehicles. To show these claims experimentally, the Joint Graph Optimization is compared to the ground truth on an industrial parking space. Mapping performance is evaluated using a dense map from a total station as reference and localization results are compared with a deeply coupled DGPS/INS system.