Refine
Document Type
- Conference proceeding (26)
- Journal article (15)
Language
- English (41)
Is part of the Bibliography
- yes (41)
Institute
- Technik (40)
- Informatik (2)
- Texoversum (2)
- Zentrale Einrichtungen (1)
Publisher
- IEEE (9)
- Springer (7)
- SCITEPRESS (4)
- ACM (2)
- International Academy Publishing (2)
- SciTePress (2)
- Cornell University (1)
- De Gruyter Oldenbourg (1)
- Elsevier (1)
- Hometrica Consulting (1)
A 3D face modelling approach for pose-invariant face recognition in a human-robot environment
(2017)
Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis.
3D morphable face models are a powerful tool in computer vision. They consist of a PCA model of face shape and colour information and allow to reconstruct a 3D face from a single 2D image. 3D morphable face models are used for 3D head pose estimation, face analysis, face recognition, and, more recently, facial landmark detection and tracking. However, they are not as widely used as 2D methods - the process of building and using a 3D model is much more involved.
In this paper, we present the Surrey Face Model, a multi resolution 3D morphable model that we make available to the public for non-commercial purposes. The model contains different mesh resolution levels and landmark point annotations as well as metadata for texture remapping. Accompanying the model is a lightweight open-source C++ library designed with simplicity and ease of integration as its foremost goals. In addition to basic functionality, it contains pose estimation and face frontalisation algorithms. With the tools presented in this paper, we aim to close two gaps. First, by offering different model resolution levels and fast fitting functionality, we enable the use of a 3D Morphable Model in time-critical applications like tracking. Second, the software library makes it easy for the community to adopt the 3D morphable face model in their research, and it offers a public place for collaboration.
Advancing mental health diagnostics: AI-based method for depression detection in patient interviews
(2023)
In this paper, we present a novel artificial intelligence (AI) application for depression detection, using advanced transformer networks to analyse clinical interviews. By incorporating simulated data to enhance traditional datasets, we overcome limitations in data protection and privacy, consequently improving the model’s performance. Our methodology employs BERT-based models, GPT-3.5, and ChatGPT-4, demonstrating state-of-the-art results in detecting depression from linguistic patterns and contextual information that significantly outperform previous approaches. Utilising the DAIC-WOZ and Extended-DAIC datasets, our study showcases the potential of the proposed application in revolutionising mental health care through early depression detection and intervention. Empirical results from various experiments highlight the efficacy of our approach and its suitability for real-world implementation. Furthermore, we acknowledge the ethical, legal, and social implications of AI in mental health diagnostics. Ultimately, our study underscores the transformative potential of AI in mental health diagnostics, paving the way for innovative solutions that can facilitate early intervention and improve patient outcomes.
Aimed at the problem that the accuracy of face image classification in complex environment is not high, a network model F-Net suitable for aesthetic classification of face images is proposed. Based on LeNet-5, the model uses convolutional layers to extract facial image features in complex backgrounds, optimized parameters in the network model, and changes the number of convolutional layers and fully connected layer feature elements in the model. The experimental results show that the F-Net network model proposed in this paper has a face image classifation accuracy of 73% in complex environment background, which is better than other classical convolutional neural network classification models.
An interactive clothing design and a personalized virtual display with user’s own face are presented in this paper to meet the requirement of personalized clothing customization. A customer interactive clothing design approach based on genetic engineering ideas is analyzed by taking suit as an example. Thus, customers could rearrange the clothing style elements, chose available color, fabric and come up with their own personalized suit style. A web 3D customization prototype system of personalized clothing is developed based on the Unity3D and VR technology. The layout of the structure and functions combined with the flow of the system are given. Practical issues such as 3D face scanning, suit style design, fabric selection, and accessory choices are addressed also. Tests to the prototype system indicate that it could show realistic clothing and fabric effect and offer effective visual and customization experience to users.
Patterns are virtually simulated in 3D CAD programs before production to check the fit. However, achieving lifelike representations of human avatars, especially regarding soft tissue dynamics, remains challenging. This is mainly since conventional avatars in garment CAD programs are simulated with a continuous hard surface and not corresponding to the human physical and mechanical body properties of soft tissue. In the real world, the human body’s natural shape is affected by the contact pressure of tight-fitting textiles. To verify the fit of a simulated garment, the interactions between the individual body shape and the garment must be considered. This paper introduces an innovative approach to digitising the softness of human tissue using 4D scanning technology. The primary objective of this research is to explore the interactions between tissue softness and different compression levels of apparel, exerting pressure on the tissue to capture the changes in the natural shape. Therefore, to generate data and model an avatar with soft body physics, it is essential to capture the deform ability and elasticity of the soft tissue and map it into the modification options for a simulation. To aim this, various methods from different fields were researched and compared to evaluate 4D scanning as the most suitable method for capturing tissue deformability in vivo. In particular, it should be considered that the human body has different deformation capabilities depending on age, the amount of muscle and body fat. In addition, different tissue zones have different mechanical properties, so it is essential to identify and classify them to back up these properties for the simulation. It has been shown that by digitising the obtained data of the different defined applied pressure levels, a prediction of the deformation of the tissue of the exact person becomes possible. As technology advances and data sets grow, this approach has the potential to reshape how we verify fit digitally with soft avatars and leverage their realistic soft tissue properties for various practical purposes.
Annotations of character IDs in news images are critical as ground truth for news retrieval and recommendation system. Universality and accuracy optimization of deep neural network models constitutes the key technology to improve the precision and computing efficiency of automatic news character identification, which is attracting increased attention globally. This paper explores the optimized deep neural network model for automatic focus personage identification in multi-lingual news. First, the face model of the focus personage is trained by using the corresponding face images from German news as positive samples. Next, the scheme of Recurrent Convolutional Neural Network (RCNN) + Bi-directional Long-Short Term Memory (Bi-LSTM) + Conditional Random Field (CRF) is utilized to label the focus name, and the RCNN-RCNN encoder–decoder is applied to translate names of people into multiple languages. Third, face features are described by combining the advantages of Local Gabor Binary Pattern Histogram Sequence (LGBPHS) and RCNN, and iterative quantization (ITQ) is used to binarize codes. Finally, a name semantic network is built for different domains. Experiments are performed on a dataset which comprises approximately 100,000 news images. The experimental results demonstrate that the proposed method achieves a significant improvement over other algorithms.
Fitting 3D Morphable Face Models (3DMM) to a 2D face image allows the separation of face shape from skin texture, as well as correction for face expression. However, the recovered 3D face representation is not readily amenable to processing by convolutional neural networks (CNN). We propose a conformal mapping from a 3D mesh to a 2D image, which makes these machine learning tools accessible by 3D face data. Experiments with a CNN based face recognition system designed using the proposed representation have been carried out to validate the advocated approach. The results obtained on standard benchmarking data sets show its promise.
Fault diagnosis of rolling bearings is an essential process for improving the reliability and safety of the rotating machinery. It is always a major challenge to ensure fault diag- nosis accuracy in particular under severe working conditions. In this article, a deep adversarial domain adaptation (DADA) model is proposed for rolling bearing fault diagnosis. This model con- structs an adversarial adaptation network to solve the commonly encountered problem in numerous real applications: the source domain and the target domain are inconsistent in their distribution. First, a deep stack autoencoder (DSAE) is combined with representative feature learning for dimensionality reduction, and such a combination provides an unsupervised learning method to effectively acquire fault features. Meanwhile, domain adaptation and recognition classification are implemented using a Softmax classifier to augment classification accuracy. Second, the effects of the number of hidden layers in the stack autoencoder network, the number of neurons in each hidden layer, and the hyperparameters of the proposed fault diagnosis algorithm are analyzed. Third, comprehensive analysis is performed on real data to vali- date the performance of the proposed method; the experimental results demonstrate that the new method outperforms the existing machine learning and deep learning methods, in terms of classification accuracy and generalization ability.
Deep learning-based EEG detection of mental alertness states from drivers under ethical aspects
(2021)
One of the most critical factors for a successful road trip is a high degree of alertness while driving. Even a split second of inattention or sleepiness in a crucial moment, will make the difference between life and death. Several prestigious car manufacturers are currently pursuing the aim of automated drowsiness identification to resolve this problem. The path between neuro-scientific research in connection with artificial intelligence and the preservation of the dignity of human individual’s and its inviolability, is very narrow. The key contribution of this work is a system of data analysis for EEGs during a driving session, which draws on previous studies analyzing heart rate (ECG), brain waves (EEG), and eye function (EOG). The gathered data is hereby treated as sensitive as possible, taking ethical regulations into consideration. Obtaining evaluable signs of evolving exhaustion includes techniques that obtain sleeping stage frequencies, problematic are hereby the correlated interference’s in the signal. This research focuses on a processing chain for EEG band splitting that involves band-pass filtering, principal component analysis (PCA), independent component analysis (ICA) with automatic artefact severance, and fast fourier transformation (FFT). The classification is based on a step-by-step adaptive deep learning analysis that detects theta rhythms as a drowsiness predictor in the pre-processed data. It was possible to obtain an offline detection rate of 89% and an online detection rate of 73%. The method is linked to the simulated driving scenario for which it was developed. This leaves space for more optimization on laboratory methods and data collection during wakefulness-dependent operations.
This paper presents a permanent magnet tubular linear generator system for powering passive sensors using vertical vibration harvesting energy. The system consists of a permanent magnet tubular linear vibration generator and electric circuits. By using the design of mechanical resonant movers, the generator is capable of converting low frequencies small amplitude vertical vibration energy into more regular sinusoidal electrical energy. The distribution of the magnetic field and electromotive force are calculated by Finite Element Analysis. The characteristics of the linear vibration generator system are observed. The experimental results show the generator can produce about 0.4W~1.6W electrical power when the vibration source's amplitude is fixed on 2mm and the frequencies are between 13Hz and 22Hz.
Efficient and robust 3D object reconstruction based on monocular SLAM and CNN semantic segmentation
(2019)
Various applications implement slam technology, especially in the field of robot navigation. We show the advantage of slam technology for independent 3d object reconstruction. To receive a point cloud of every object of interest void of its environment, we leverage deep learning. We utilize recent cnn deep learning research for accurate semantic segmentation of objects. In this work, we propose two fusion methods for cnn-based semantic segmentation and slam for the 3d reconstruction of objects of interest in order to obtain a more robustness and efficiency. As a major novelty, we introduce a cnn-based masking to focus slam only on feature points belonging to every single object. Noisy, complex or even non-rigid features in the background are filtered out, improving the estimation of the camera pose and the 3d point cloud of each object. Our experiments are constrained to the reconstruction of industrial objects. We present an analysis of the accuracy and performance of each method and compare the two methods describing their pros and cons.
In recent years robotic systems have matured enough to perform simple home or office tasks, guide visitors in environments such as museums or stores and aid people in their daily life. To make the interaction with service and even industrial robots as fast and intuitive as possible, researchers strive to create transparent interfaces close to human-human interaction. As facial expressions play a central role in human-human communication, robot faces were implemented with varying degrees of human-likeness and expressiveness. We propose an emotion model to parameterize a screen based facial animation via inter-process communication. A software will animate transitions and add additional animations to make a digital face appear “alive” and equip a robotic system with a virtual face. The result will be an inviting appearance to motivate potential users to seek interaction with the robot.
Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorithms that are robust against biased information is a new challenge for scientists. As aesthetic judgement usually is biased, we want to take it one step further and propose an Unbiased Convolutional Neural Network for FBP. While it is possible to create network models that can rate attractiveness of faces on a high level, from an ethical point of view, it is equally important to make sure the model is unbiased. In this work, we introduce AestheticNet, a state-of-the-art attractiveness prediction network, which significantly outperforms competitors with a Pearson Correlation of 0.9601. Additionally, we propose a new approach for generating a bias-free CNN to improve fairness in machine learning.
This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition.
In recent years, the demand for accurate and efficient 3D body scanning technologies has increased, driven by the growing interest in personalised textile development and health care. This position paper presents the implementation of a novel 3D body scanner that integrates multiple RGB cameras and image stitching techniques to generate detailed point clouds and 3D mesh models. Our system significantly enhances the scanning process, achieving higher resolution and fidelity while reducing the cost, time and effort required for data acquisition and processing. Furthermore, we evaluate the potential use cases and applications of our 3D body scanner, focusing on the textile technology and health sectors. In textile development, the 3D scanner contributes to bespoke clothing production, allowing designers to construct made-to-measure garments, thus minimising waste and enhancing customer satisfaction through fitting clothing. In mental health care, the 3D body scanner can be employed as a tool for body image analysis, providing valuable insights into the psychological and emotional aspects of self-perception. By exploring the synergy between the 3D body scanner and these fields, we aim to foster interdisciplinary collaborations that drive advancements in personalisation, sustainability, and well-being.
We address the problem of 3D face recognition based on either 3D sensor data, or on a 3D face reconstructed from a 2D face image. We focus on 3D shape representation in terms of a mesh of surface normal vectors. The first contribution of this work is an evaluation of eight different 3D face representations and their multiple combinations. An important contribution of the study is the proposed implementation, which allows these representations to be computed directly from 3D meshes, instead of point clouds. This enhances their computational efficiency. Motivated by the results of the comparative evaluation, we propose a 3D face shape descriptor, named Evolutional Normal Maps, that assimilates and optimises a subset of six of these approaches. The proposed shape descriptor can be modified and tuned to suit different tasks. It is used as input for a deep convolutional network for 3D face recognition. An extensive experimental evaluation using the Bosphorus 3D Face, CASIA 3D Face and JNU-3D Face datasets shows that, compared to the state of the art methods, the proposed approach is better in terms of both computational cost and recognition accuracy.
Annotations of subject IDs in images are very important as ground truth for face recognition applications and news retrieval systems. Face naming is becoming a significant research topic in news image indexing applications. By exploiting the uniqueness of name, face naming is transformed to the problem of multiple instance learning (MIL) with exclusive constraint, namely the eMIL problem. First, the positive bags and the negative bags are automatically annotated by a hybrid recurrent convolutional neural network and a distributed affinity propagation cluster. Next, positive instance selection and updating are used to reduce the influence of false-positive bag and to improve the performance. Finally, max exclusive density and iterative Max-ED algorithms are proposed to solve the eMIL problem. The experimental results show that the proposed algorithms achieve a significant improvement over other algorithms.
For autonomously driving cars and intelligent vehicles it is crucial to understand the scene context including objects in the surrounding. A fundamental technique accomplishing this is scene labeling. That is, assigning a semantic class to each pixel in a scene image. This task is commonly tackled quite well by fully convolutional neural networks (FCN). Crucial factors are a small model size and a low execution time. This work presents the first method that exploits depth cues together with confidence estimates in a CNN. To this end, novel experimentally grounded network architecture is proposed to perform robust scene labeling that does not require costly preprocessing like CRFs or LSTMs as commonly used in related work. The effectiveness of this approach is demonstrated in an extensive evaluation on a challenging real-world dataset. The new architecture is highly optimized for high accuracy and low execution time.
In this paper, we propose a novel fitting method that uses local image features to fit a 3D morphable face model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on morphable model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a 3D morphable model. Because of the speed of our method, it is applicable for realtime applications. Our cascaded regression framework is available as an open source library at github.com/patrikhuber/ superviseddescent.