Refine
Document Type
- Conference proceeding (26) (remove)
Language
- English (26)
Is part of the Bibliography
- yes (26)
Institute
- Technik (25)
- Informatik (2)
- Texoversum (2)
- Zentrale Einrichtungen (1)
Publisher
In this paper, we propose a novel fitting method that uses local image features to fit a 3D morphable face model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on morphable model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a 3D morphable model. Because of the speed of our method, it is applicable for realtime applications. Our cascaded regression framework is available as an open source library at github.com/patrikhuber/ superviseddescent.
Socially interactive robots with human-like speech synthesis and recognition, coupled with humanoid appearance, are an important subject of robotics and artificial intelligence research. Modern solutions have matured enough to provide simple services to human users. To make the interaction with them as fast and intuitive as possible, researchers strive to create transparent interfaces close to human-human interaction. Because facial expressions play a central role in human-human communication, robot faces were implemented with varying degrees of human-likeness and expressiveness. We propose a way to implement a program that believably animates changing facial expressions and allows to influence them via inter-process communication based on an emotion model. This will can be used to create a screen based virtual face for a robotic system with an inviting appearance to stimulate users to seek interaction with the robot.
We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. We use a 3D morphable face model to obtain a semi-dense shape and combine it with a fast median-based super-resolution technique to obtain a high-fidelity textured 3D face model. Our system does not need prior training and is designed to work in uncontrolled scenarios.
3D morphable face models are a powerful tool in computer vision. They consist of a PCA model of face shape and colour information and allow to reconstruct a 3D face from a single 2D image. 3D morphable face models are used for 3D head pose estimation, face analysis, face recognition, and, more recently, facial landmark detection and tracking. However, they are not as widely used as 2D methods - the process of building and using a 3D model is much more involved.
In this paper, we present the Surrey Face Model, a multi resolution 3D morphable model that we make available to the public for non-commercial purposes. The model contains different mesh resolution levels and landmark point annotations as well as metadata for texture remapping. Accompanying the model is a lightweight open-source C++ library designed with simplicity and ease of integration as its foremost goals. In addition to basic functionality, it contains pose estimation and face frontalisation algorithms. With the tools presented in this paper, we aim to close two gaps. First, by offering different model resolution levels and fast fitting functionality, we enable the use of a 3D Morphable Model in time-critical applications like tracking. Second, the software library makes it easy for the community to adopt the 3D morphable face model in their research, and it offers a public place for collaboration.
A 3D face modelling approach for pose-invariant face recognition in a human-robot environment
(2017)
Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis.
This paper presents a novel multi-modal CNN architecture that exploits complementary input cues in addition to sole color information. The joint model implements a mid-level fusion that allows the network to exploit cross modal interdependencies already on a medium feature-level. The benefit of the presented architecture is shown for the RGB-D image understanding task. So far, state-of-the-art RGB-D CNNs have used network weights trained on color data. In contrast, a superior initialization scheme is proposed to pre-train the depth branch of the multi-modal CNN independently. In an end-to-end training the network parameters are optimized jointly using the challenging Cityscapes dataset. In thorough experiments, the effectiveness of the proposed model is shown. Both, the RGB GoogLeNet and further RGB-D baselines are outperformed with a significant margin on two different tasks: semantic segmentation and object detection. For the latter, this paper shows how to extract object level groundtruth from the instance level annotations in Cityscapes in order to train a powerful object detector.
Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorithms that are robust against biased information is a new challenge for scientists. As aesthetic judgement usually is biased, we want to take it one step further and propose an Unbiased Convolutional Neural Network for FBP. While it is possible to create network models that can rate attractiveness of faces on a high level, from an ethical point of view, it is equally important to make sure the model is unbiased. In this work, we introduce AestheticNet, a state-of-the-art attractiveness prediction network, which significantly outperforms competitors with a Pearson Correlation of 0.9601. Additionally, we propose a new approach for generating a bias-free CNN to improve fairness in machine learning.
Understanding the factors that influence the accuracy of visual SLAM algorithms is very important for the future development of these algorithms. So far very few studies have done this. In this paper, a simulation model is presented and used to investigate the effect of the number of scene points tracked, the effect of the baseline length in triangulation and the influence of image point location uncertainty. It is shown that the latter is very critical, while the other all play important roles. Experiments with a well known semi-dense visual SLAM approach are also presented, when used in a monocular visual odometry mode. The experiments show that not including sensor bias and scale factor uncertainty is very detrimental to the accuracy of the simulation results.
This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition.
Fitting 3D Morphable Face Models (3DMM) to a 2D face image allows the separation of face shape from skin texture, as well as correction for face expression. However, the recovered 3D face representation is not readily amenable to processing by convolutional neural networks (CNN). We propose a conformal mapping from a 3D mesh to a 2D image, which makes these machine learning tools accessible by 3D face data. Experiments with a CNN based face recognition system designed using the proposed representation have been carried out to validate the advocated approach. The results obtained on standard benchmarking data sets show its promise.
SLAM systems are mainly applied for robot navigation while research on feasibility for motion planning with SLAM for tasks like bin-picking, is scarce. Accurate 3D reconstruction of objects and environments is important for planning motion and computing optimal gripper pose to grasp objects. In this work, we propose the methods to analyze the accuracy of a 3D environment reconstructed using a LSD-SLAM system with a monocular camera mounted onto the gripper of a collaborative robot. We discuss and propose a solution to the pose space conversion problem. Finally, we present several criteria to analyze the 3D reconstruction accuracy. These could be used as guidelines to improve the accuracy of 3D reconstructions with monocular LSD-SLAM and other SLAM based solutions.
In the last 20 years there have been major advances in autonomous robotics. In IoT (Industry 4.0), mobile robots require more intuitive interaction possibilities with humans in order to expand its field of applications. This paper describes a user-friendly setup, which enables a person to lead the robot in an unknown environment. The environment has to be perceived by means of sensory input. For realizing a cost and resource efficient Follow Me application we use a single monocular camera as low-cost sensor. For efficient scaling of our Simultaneous Localization and Mapping (SLAM) algorithm, we integrate an inertial measurement unit (IMU) sensor. With the camera input we detect and track a person. We propose combining state of the art deep learning with Convolutional Neural Network (CNN) and SLAM algorithms functionality on the same input camera image. Based on the output robot navigation is possible. This work presents the specification, workflow for an efficient development of the Follow Me application. Our application’s delivered point clouds are also used for surface construction. For demonstration, we use our platform SCITOS G5 equipped with the afore mentioned sensors. Preliminary tests show the system works robustly in the wild.
Efficient and robust 3D object reconstruction based on monocular SLAM and CNN semantic segmentation
(2019)
Various applications implement slam technology, especially in the field of robot navigation. We show the advantage of slam technology for independent 3d object reconstruction. To receive a point cloud of every object of interest void of its environment, we leverage deep learning. We utilize recent cnn deep learning research for accurate semantic segmentation of objects. In this work, we propose two fusion methods for cnn-based semantic segmentation and slam for the 3d reconstruction of objects of interest in order to obtain a more robustness and efficiency. As a major novelty, we introduce a cnn-based masking to focus slam only on feature points belonging to every single object. Noisy, complex or even non-rigid features in the background are filtered out, improving the estimation of the camera pose and the 3d point cloud of each object. Our experiments are constrained to the reconstruction of industrial objects. We present an analysis of the accuracy and performance of each method and compare the two methods describing their pros and cons.
3D assisted 2D face recognition involves the process of reconstructing 3D faces from 2D images and solving the problem of face recognition in 3D. To facilitate the use of deep neural networks, a 3D face, normally represented as a 3D mesh of vertices and its corresponding surface texture, is remapped to image-like square isomaps by a conformal mapping. Based on previous work, we assume that face recognition benefits more from texture. In this work, we focus on the surface texture and its discriminatory information content for recognition purposes. Our approach is to prepare a 3D mesh, the corresponding surface texture and the original 2D image as triple input for the recognition network, to show that 3D data is useful for face recognition. Texture enhancement methods to control the texture fusion process are introduced and we adapt data augmentation methods. Our results show that texture-map-based face recognition can not only compete with state-of-the-art systems under the same precon ditions but also outperforms standard 2D methods from recent years.
”I have never seen one who loves virtue as much as he loves beauty,” Confucius once said. If beauty is more important as goodness, it becomes clear why people invest so much effort in their first impression. The aesthetic of faces has many aspects and there is a strong correlation to all characteristics of humans, like age and gender. Often, research on aesthetics by social and ethic scientists lacks sufficient labelled data and the support of machine vision tools. In this position paper we propose the Aesthetic-Faces dataset, containing training data which is labelled by Chinese and German annotators. As a combination of three image subsets, the AF-dataset consists of European, Asian and African people. The research communities in machine learning, aesthetics and social ethics can benefit from our dataset and our toolbox. The toolbox provides many functions for machine learning with state-of-the-art CNNs and an Extreme-Gradient-Boosting regressor, but also 3D Morphable Model technolo gies for face shape evaluation and we discuss how to train an aesthetic estimator considering culture and ethics.
This paper presents a permanent magnet tubular linear generator system for powering passive sensors using vertical vibration harvesting energy. The system consists of a permanent magnet tubular linear vibration generator and electric circuits. By using the design of mechanical resonant movers, the generator is capable of converting low frequencies small amplitude vertical vibration energy into more regular sinusoidal electrical energy. The distribution of the magnetic field and electromotive force are calculated by Finite Element Analysis. The characteristics of the linear vibration generator system are observed. The experimental results show the generator can produce about 0.4W~1.6W electrical power when the vibration source's amplitude is fixed on 2mm and the frequencies are between 13Hz and 22Hz.
Deep learning-based EEG detection of mental alertness states from drivers under ethical aspects
(2021)
One of the most critical factors for a successful road trip is a high degree of alertness while driving. Even a split second of inattention or sleepiness in a crucial moment, will make the difference between life and death. Several prestigious car manufacturers are currently pursuing the aim of automated drowsiness identification to resolve this problem. The path between neuro-scientific research in connection with artificial intelligence and the preservation of the dignity of human individual’s and its inviolability, is very narrow. The key contribution of this work is a system of data analysis for EEGs during a driving session, which draws on previous studies analyzing heart rate (ECG), brain waves (EEG), and eye function (EOG). The gathered data is hereby treated as sensitive as possible, taking ethical regulations into consideration. Obtaining evaluable signs of evolving exhaustion includes techniques that obtain sleeping stage frequencies, problematic are hereby the correlated interference’s in the signal. This research focuses on a processing chain for EEG band splitting that involves band-pass filtering, principal component analysis (PCA), independent component analysis (ICA) with automatic artefact severance, and fast fourier transformation (FFT). The classification is based on a step-by-step adaptive deep learning analysis that detects theta rhythms as a drowsiness predictor in the pre-processed data. It was possible to obtain an offline detection rate of 89% and an online detection rate of 73%. The method is linked to the simulated driving scenario for which it was developed. This leaves space for more optimization on laboratory methods and data collection during wakefulness-dependent operations.
For collision and obstacle avoidance as well as trajectory planning, robots usually generate and use a simple 2D costmap without any semantic information about the detected obstacles. Thus a robot’s path planning will simply adhere to an arbitrarily large safety margin around obstacles. A more optimal approach is to adjust this safety margin according to the class of an obstacle. For class prediction, an image processing convolutional neural network can be trained. One of the problems in the development and training of any neural network is the creation of a training dataset. The first part of this work describes methods and free open source software, allowing a fast generation of annotated datasets. Our pipeline can be applied to various objects and environment settings and is extremely easy to use to anyone for synthesising training data from 3D source data. We create a fully synthetic industrial environment dataset with 10 k physically-based rendered images and annotations. Our da taset and sources are publicly available at https://github.com/LJMP/synthetic-industrial-dataset. Subsequently, we train a convolutional neural network with our dataset for costmap safety class prediction. We analyse different class combinations and show that learning the safety classes end-to-end directly with a small dataset, instead of using a class lookup table, improves the quantity and precision of the predictions.
We address the problem of 3D face recognition based on either 3D sensor data, or on a 3D face reconstructed from a 2D face image. We focus on 3D shape representation in terms of a mesh of surface normal vectors. The first contribution of this work is an evaluation of eight different 3D face representations and their multiple combinations. An important contribution of the study is the proposed implementation, which allows these representations to be computed directly from 3D meshes, instead of point clouds. This enhances their computational efficiency. Motivated by the results of the comparative evaluation, we propose a 3D face shape descriptor, named Evolutional Normal Maps, that assimilates and optimises a subset of six of these approaches. The proposed shape descriptor can be modified and tuned to suit different tasks. It is used as input for a deep convolutional network for 3D face recognition. An extensive experimental evaluation using the Bosphorus 3D Face, CASIA 3D Face and JNU-3D Face datasets shows that, compared to the state of the art methods, the proposed approach is better in terms of both computational cost and recognition accuracy.
AI-based prediction and recommender systems are widely used in various industry sectors. However, general acceptance of AI-enabled systems is still widely uninvestigated. Therefore, firstly we conducted a survey with 559 respondents. Findings suggested that AI-enabled systems should be fair, transparent, consider personality traits and perform tasks efficiently. Secondly, we developed a system for the Facial Beauty Prediction (FBP) benchmark that automatically evaluates facial attractiveness. As our previous experiments have proven, these results are usually highly correlated with human ratings. Consequently they also reflect human bias in annotations. An upcoming challenge for scientists is to provide training data and AI algorithms that can withstand distorted information. In this work, we introduce AntiDiscriminationNet (ADN), a superior attractiveness prediction network. We propose a new method to generate an unbiased convolutional neural network (CNN) to improve the fairn ess of machine learning in facial dataset. To train unbiased networks we generate synthetic images and weight training data for anti-discrimination assessments towards different ethnicities. Additionally, we introduce an approach with entropy penalty terms to reduce the bias of our CNN. Our research provides insights in how to train and build fair machine learning models for facial image analysis by minimising implicit biases. Our AntiDiscriminationNet finally outperforms all competitors in the FBP benchmark by achieving a Pearson correlation coefficient of PCC = 0.9601.