Refine
Document Type
- Conference Proceeding (12)
- Article (8)
In visual adaptive tracking, the tracker adapts to the target, background, and conditions of the image sequence. Each update introduces some error, so the tracker might drift away from the target over time. To increase the robustness against the drifting problem, we present three ideas on top of a particle filter framework: An optical-flow-based motion estimation, a learning strategy for preventing bad updates while staying adaptive, and a sliding window detector for failure detection and finding the best training examples. We experimentally evaluate the ideas using the BoBoT dataseta. The code of our tracker is available online.
In this paper, we propose a novel fitting method that uses local image features to fit a 3D morphable face model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on morphable model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a 3D morphable model. Because of the speed of our method, it is applicable for realtime applications. Our cascaded regression framework is available as an open source library at github.com/patrikhuber/ superviseddescent.
Socially interactive robots with human-like speech synthesis and recognition, coupled with humanoid appearance, are an important subject of robotics and artificial intelligence research. Modern solutions have matured enough to provide simple services to human users. To make the interaction with them as fast and intuitive as possible, researchers strive to create transparent interfaces close to human-human interaction. Because facial expressions play a central role in human-human communication, robot faces were implemented with varying degrees of human-likeness and expressiveness. We propose a way to implement a program that believably animates changing facial expressions and allows to influence them via inter-process communication based on an emotion model. This will can be used to create a screen based virtual face for a robotic system with an inviting appearance to stimulate users to seek interaction with the robot.
We presented our robot framework and our efforts to make face analysis more robust towards self-occlusion caused by head pose. By using a lightweight linear fitting algorithm, we are able to obtain 3D models of human faces in real-time. The combination of adaptive tracking and 3D face modelling for the analysis of human faces is used as a basis for further research on human-machine interaction on our SCITOS robot platform.
3D morphable face models are a powerful tool in computer vision. They consist of a PCA model of face shape and colour information and allow to reconstruct a 3D face from a single 2D image. 3D morphable face models are used for 3D head pose estimation, face analysis, face recognition, and, more recently, facial landmark detection and tracking. However, they are not as widely used as 2D methods - the process of building and using a 3D model is much more involved.
In this paper, we present the Surrey Face Model, a multi resolution 3D morphable model that we make available to the public for non-commercial purposes. The model contains different mesh resolution levels and landmark point annotations as well as metadata for texture remapping. Accompanying the model is a lightweight open-source C++ library designed with simplicity and ease of integration as its foremost goals. In addition to basic functionality, it contains pose estimation and face frontalisation algorithms. With the tools presented in this paper, we aim to close two gaps. First, by offering different model resolution levels and fast fitting functionality, we enable the use of a 3D Morphable Model in time-critical applications like tracking. Second, the software library makes it easy for the community to adopt the 3D morphable face model in their research, and it offers a public place for collaboration.
We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. We use a 3D morphable face model to obtain a semi-dense shape and combine it with a fast median-based super-resolution technique to obtain a high-fidelity textured 3D face model. Our system does not need prior training and is designed to work in uncontrolled scenarios.
We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. With the use of a cascaded-regressor-based face tracking and a 3D morphable face model shape fitting, we obtain a semidense 3D face shape. We further use the texture information from multiple frames to build a holistic 3D face representation from the video footage. Our system is able to capture facial expressions and does not require any person specific training. We demonstrate the robustness of our approach on the challenging 300 Videos in the Wild (300- VW) dataset. Our real-time fitting framework is available as an open-source library at http://4dface.org.
For autonomously driving cars and intelligent vehicles it is crucial to understand the scene context including objects in the surrounding. A fundamental technique accomplishing this is scene labeling. That is, assigning a semantic class to each pixel in a scene image. This task is commonly tackled quite well by fully convolutional neural networks (FCN). Crucial factors are a small model size and a low execution time. This work presents the first method that exploits depth cues together with confidence estimates in a CNN. To this end, novel experimentally grounded network architecture is proposed to perform robust scene labeling that does not require costly preprocessing like CRFs or LSTMs as commonly used in related work. The effectiveness of this approach is demonstrated in an extensive evaluation on a challenging real-world dataset. The new architecture is highly optimized for high accuracy and low execution time.
Understanding the factors that influence the accuracy of visual SLAM algorithms is very important for the future development of these algorithms. So far very few studies have done this. In this paper, a simulation model is presented and used to investigate the effect of the number of scene points tracked, the effect of the baseline length in triangulation and the influence of image point location uncertainty. It is shown that the latter is very critical, while the other all play important roles. Experiments with a well known semi-dense visual SLAM approach are also presented, when used in a monocular visual odometry mode. The experiments show that not including sensor bias and scale factor uncertainty is very detrimental to the accuracy of the simulation results.
In recent years robotic systems have matured enough to perform simple home or office tasks, guide visitors in environments such as museums or stores and aid people in their daily life. To make the interaction with service and even industrial robots as fast and intuitive as possible, researchers strive to create transparent interfaces close to human-human interaction. As facial expressions play a central role in human-human communication, robot faces were implemented with varying degrees of human-likeness and expressiveness. We propose an emotion model to parameterize a screen based facial animation via inter-process communication. A software will animate transitions and add additional animations to make a digital face appear “alive” and equip a robotic system with a virtual face. The result will be an inviting appearance to motivate potential users to seek interaction with the robot.