In recent years, fraudsters have begun to use readily accessible digital manipulation techniques in order to carry out face morphing attacks. By submitting a morph image (a 50/50 average of two people’s faces) for inclusion in an official document such as a passport, it might be possible that both people sufficiently resemble the morph that they are each able to use the resulting genuine ID document. Limited research with low-quality morphs has shown that human detection rates were poor but that training methods can improve performance. Here, we investigate human and computer performance with high-quality morphs, comparable with those expected to be used by criminals. Over four experiments, we found that people were highly error-prone when detecting morphs and that training did not produce improvements. In a live matching task, morphs were accepted at levels suggesting they represent a significant concern for security agencies and detection was again error-prone. Finally, we found that a simple computer model outperformed our human participants. Taken together, these results reinforce the idea that advanced computational techniques could prove more reliable than training people when fighting these types of morphing attacks. Our findings have important implications for security authorities worldwide.
Models of social evaluation aim to capture the information people use to form first impressions of unfamiliar others. However, little is currently known about the relationship between perceived traits across gender. In Study 1, we asked viewers to provide ratings of key social dimensions (dominance, trustworthiness, etc.) for multiple images of 40 unfamiliar identities. We observed clear sex differences in the perception of dominance—with negative evaluations of high dominance in unfamiliar females but not males. In Study 2, we used the social evaluation context to investigate the key predictions about the importance of pictorial information in familiar and unfamiliar face processing. We compared the consistency of ratings attributed to different images of the same identities and demonstrated that ratings of images depicting the same familiar identity are more tightly clustered than those of unfamiliar identities. Such results imply a shift from image rating to person rating with increased familiarity, a finding which generalises results previously observed in studies of identification.
A paradoxical finding from recent studies of face perception is that observers are error-prone and inconsistent when judging the identity of unfamiliar faces, but nevertheless reasonably consistent when judging traits. Our aim is to understand this difference. Using everyday ambient images of faces, we show that visual image statistics can predict observers’ consensual impressions of trustworthiness, attractiveness and dominance, which represent key dimensions of evaluation in leading theoretical accounts of trait judgement. In Study 1, image statistics derived from ambient images of multiple face identities were able to account for 51% of the variance in consensual impressions of entirely novel ambient images. Shape properties were more effective predictors than surface properties, but a combination of both achieved the best results. In Study 2 and Study 3, statistics derived from multiple images of a particular face achieved the best generalisation to new images of that face, but there was nonetheless significant generalisation between images of the faces of different individuals. Hence, whereas idiosyncratic variability across different images of the same face is sufficient to cause substantial problems in judging the identities of unfamiliar faces, there are consistencies between faces which are sufficient to support (to some extent) consensual trait judgements. Furthermore, much of this consistency can be captured in simple operational models based on image statistics.
We know from previous research that unfamiliar face matching (determining whether two simultaneously presented images show the same person or not) is very error‐prone. A small number of studies in laboratory settings have shown that the use of multiple images or a face average, rather than a single image, can improve face matching performance. Here, we tested 1,999 participants using four‐image arrays and face averages in two separate live matching tasks. Matching a single image to a live person resulted in numerous errors (79.9% accuracy across both experiments), and neither multiple images (82.4% accuracy) nor face averages (76.9% accuracy) improved performance. These results are important when considering possible alterations which could be made to photo‐ID. Although multiple images and face averages have produced measurable improvements in performance in recent laboratory studies, they do not produce benefits in a real‐world live face matching context.
Matching two different images of an unfamiliar face is difficult, although we rely on this process every day when proving our identity. Although previous work with laboratory photosets has shown that performance is error-prone, few studies have focussed on how accurately people carry out this matching task using photographs taken from official forms of identification. In Experiment 1, participants matched high-resolution, colour face photos with current UK driving licence photos of the same group of people in a sorting task. Averaging 19 mistaken pairings out of 30, our results showed that this task was both difficult and error-prone. In Experiment 2, high-resolution photographs were paired with either driving licence or passport photographs in a typical pairwise matching paradigm. We found no difference in performance levels for the two types of ID image, with both producing unacceptable levels of accuracy (around 75%–79% correct). The current work benefits from increased ecological validity and provides a clear demonstration that these forms of official identification are ineffective and alternatives should be considered.
A growing body of research has investigated how we associate colours and social traits. Specifically, studies have explored the links between red and perceptions of qualities like attractiveness and anger. Although less is known about other colours, the prevailing framework suggests that the specific context plays a significant role in determining how a particular colour might affect our perceptions of a person or item. Importantly, this factor has yet to be considered for children’s colour associations, where researchers focused on links between colours and emotions, rather than social traits. Here, we consider whether context-specific colour associations are demonstrated by 5- to 10-year-old children and compare these associations with adult data collected on the same task. We asked participants to rank order sets of six identical images (e.g., a boy completing a test), which varied only in the colour of a single item (his T-shirt). Each question was tailored to the image set to address a specific context, for example, “Which boy do you think looks the most likely to cheat on a test?” Our findings revealed several colour associations shared by children, and many of these were also present in adults, although some had strengthened or weakened by this stage of life. Taken together, our results demonstrate the presence of both stable and changing context-specific colour associations during development, revealing a new area of study for further exploration.
Low‐quality images are problematic for face identification, for example, when the police identify faces from CCTV images. Here, we test whether face averages, comprising multiple poor‐quality images, can improve both human and computer recognition. We created averages from multiple pixelated or nonpixelated images and compared accuracy using these images and exemplars. To provide a broad assessment of the potential benefits of this method, we tested human observers (n = 88; Experiment 1), and also computer recognition, using a smartphone application (Experiment 2) and a commercial one‐to‐many face recognition system used in forensic settings (Experiment 3). The third experiment used large image databases of 900 ambient images and 7,980 passport images. In all three experiments, we found a substantial increase in performance by averaging multiple pixelated images of a person’s face. These results have implications for forensic settings in which faces are identified from poor‐quality images, such as CCTV.
Researchers have long been interested in how social evaluations are made based upon first impressions of faces. It is also important to consider the level of agreement we see in such evaluations across raters and what this may tell us. Typically, high levels of inter-rater agreement for facial judgements are reported, but the measures used may be misleading. At present, studies commonly report Cronbach’s α as a way to quantify agreement, although problematically, there are various issues with the use of this measure. Most importantly, because researchers treat raters as items, Cronbach’s α is inflated by larger sample sizes even when agreement between raters is fixed. Here, we considered several alternative measures and investigated whether these better discriminate between traits that were predicted to show low (parental resemblance), intermediate (attractiveness, dominance, trustworthiness), and high (age, gender) levels of agreement. Importantly, the level of inter-rater agreement has not previously been studied for many of these traits. In addition, we investigated whether familiar faces resulted in differing levels of agreement in comparison with unfamiliar faces. Our results suggest that alternative measures may prove more informative than Cronbach’s α when determining how well raters agree in their judgements. Further, we found no apparent influence of familiarity on levels of agreement. Finally, we show that, like attractiveness, both trustworthiness and dominance show significant levels of private taste (personal or idiosyncratic rater perceptions), although shared taste (perceptions shared with other raters) explains similar levels of variance in people’s perceptions. In conclusion, we recommend that researchers investigating social judgements of faces consider alternatives to Cronbach’s α but should also be prepared to examine both the potential value and origin of private taste as these might prove informative.
Background: Infants and children travel using passports that are typically valid for five years (e.g. Canada, United Kingdom, United States and Australia). These individuals may also need to be identified using images taken from videos and other sources in forensic situations including child exploitation cases. However, few researchers have examined how useful these images are as a means of identification.
Methods: We investigated the effectiveness of photo identification for infants and children using a face matching task, where participants were presented with two images simultaneously and asked whether the images depicted the same child or two different children. In Experiment 1, both images showed an infant (<1 year old), whereas in Experiment 2, one image again showed an infant but the second image of the child was taken at 4–5 years of age. In Experiments 3a and 3b, we asked participants to complete shortened versions of both these tasks (selecting the most difficult trials) as well as the short version Glasgow face matching test. Finally, in Experiment 4, we investigated whether information regarding the sex of the infants and children could be accurately perceived from the images.
Results: In Experiment 1, we found low levels of performance (72% accuracy) for matching two infant photos. For Experiment 2, performance was lower still (64% accuracy) when infant and child images were presented, given the significant changes in appearance that occur over the first five years of life. In Experiments 3a and 3b, when participants completed both these tasks, as well as a measure of adult face matching ability, we found lowest performance for the two infant tasks, along with mixed evidence of within-person correlations in sensitivities across all three tasks. The use of only same-sex pairings on mismatch trials, in comparison with random pairings, had little effect on performance measures. In Experiment 4, accuracy when judging the sex of infants was at chance levels for one image set and above chance (although still low) for the other set. As expected, participants were able to judge the sex of children (aged 4–5) from their faces.
Discussion: Identity matching with infant and child images resulted in low levels of performance, which were significantly worse than for an adult face matching task. Taken together, the results of the experiments presented here provide evidence that child facial photographs are ineffective for use in real-world identification.