Results


Introduction


LFW provides information for supervised learning under two different training paradigms: image-restricted and unrestricted. Under the image-restricted setting, only binary "matched" or "mismatched" labels are given, for pairs of images. Under the unrestricted setting, the identity information of the person appearing in each image is also available, allowing one to potentially form additional image pairs. For more information, see the readme.

Often, algorithms designed for LFW will also make use of additional, external sources of training information. For instance, this issue originally arose when facial landmark detectors were being used to align the images (Huang et al.4). These detectors were pre-trained on face part images outside of LFW, so this algorithm was implicitly making use of this additional source of information. As these outside sources of training data can have a large impact on recognition accuracy, the use of such data must be considered when comparing algorithm performance. Therefore, we have roughly divided the image-restricted results into several classes based on the amount of use of outside training data. There are also additional notes on this issue.

Results in red indicate methods accepted but not yet published (e.g. accepted to an upcoming conference). Results in green indicate commercial recognition systems whose algorithms have not been published and peer-reviewed. We emphasize that researchers should not be compelled to compare against either of these types of results.

Image-Restricted Training Results


Strict LFW, no outside training data used: [see notes on the use of outside training data]

û ± SE
Eigenfaces1, original 0.6002 ± 0.0079
Nowak2, original 0.7245 ± 0.0040
Nowak2, funneled3 0.7393 ± 0.0049
Hybrid descriptor-based5, funneled 0.7847 ± 0.0051
3x3 Multi-Region Histograms (1024)6 0.7295 ± 0.0055
Pixels/MKL, funneled7 0.6822 ± 0.0041
V1-like/MKL, funneled7 0.7935 ± 0.0055

Outside training data used for alignment or feature extraction: [notes]

(commercial system, see note at top)
MERL4 0.7052 ± 0.0060
MERL+Nowak4, funneled 0.7618 ± 0.0058

LDML, funneled8 0.7927 ± 0.0060
Hybrid, aligned9 0.8398 ± 0.0035
Combined b/g samples based methods, aligned10 0.8683 ± 0.0034
Single LE + holistic14 0.8122 ± 0.0053
LBP + CSML, aligned15 0.8557 ± 0.0052
CSML + SVM, aligned15 0.8800 ± 0.0037
High-Throughput Brain-Inspired Features, aligned16 0.8813 ± 0.0058
LARK supervised20, aligned 0.8510 ± 0.0059
DML-eig SIFT21, funneled 0.8127 ± 0.0230
DML-eig combined21, funneled & aligned 0.8565 ± 0.0056

Outside training data in recognition system (beyond alignment/feature extraction): [notes]

Attribute classifiers11 0.8362 ± 0.0158
Simile classifiers11 0.8414 ± 0.0131
Attribute and Simile classifiers11 0.8529 ± 0.0123
NReLU13 0.8073 ± 0.0134
Multiple LE + comp14 0.8445 ± 0.0046
Associate-Predict18 0.9057 ± 0.0056

Human performance, measured through Amazon Mechanical Turk:

Human, funneled11 0.9920
Human, cropped11 0.9753
Human, inverse mask11 0.9427
Table 1: Mean classification accuracy û and standard error of the mean SE.
lfw restricted roc curve
Fig 1a: ROC curves averaged over 10 folds of View 2, all methods*.
lfw restricted roc curve
Fig 1b: ROC curves averaged over 10 folds of View 2, best performing*.


Unrestricted Training Results


û ± SE
LDML-MkNN, funneled8 0.8750 ± 0.0040
Combined multishot, aligned9 0.8950 ± 0.0051
LBP multishot, aligned9 0.8517 ± 0.0061
LBP PLDA, aligned17 0.8733 ± 0.0055
combined PLDA, funneled & aligned17 0.9007 ± 0.0051

(commercial system, see note at top)
face.com r2011b19 0.9130 ± 0.0030
CMD, aligned22 0.9170 ± 0.0110
SLBP, aligned22 0.9000 ± 0.0133
CMD+SLBP, aligned22 0.9258 ± 0.0136
Table 2: Mean classification accuracy û and standard error of the mean SE.
lfw unrestricted roc curve
Fig 2a: ROC curves averaged over 10 folds of View 2, published*.

lfw unrestricted roc curve
Fig 2b: ROC curves averaged over 10 folds of View 2, all*.


Unsupervised Results


û ± SE
SD-MATCHES, 125x12512, aligned 0.6410 ± 0.0062
H-XS-40, 81x15012, aligned 0.6945 ± 0.0048
GJD-BC-100, 122x22512, aligned 0.6847 ± 0.0065
LARK unsupervised20, aligned 0.7223 ± 0.0049
Table 3: Mean classification accuracy û and standard error of the mean SE.
lfw unsupervised roc curve
Fig 3: ROC curves over View 2*.


Notes


* Each point on the curve represents the average over the 10 folds of (false positive rate, true positive rate) for a fixed threshold.

(u) indicates ROC curve is for the unrestricted setting.

On the use of outside training data:

The use of training data outside of LFW can have a significant impact on recognition performance. For instance, it was shown in Wolf et al.10 that using LFW-a, the version of LFW aligned using a trained commercial alignment system, improved the accuracy of the early Nowak and Jurie method2 from 0.7393 on the funneled images to 0.7912, despite the fact that this method was designed to handle some misalignment.

To enable the fair comparison of different algorithms on LFW, we ask that researchers be specific about what type of outside training data was used in the experiments. We have also roughly separated the results into three categories.

The first class of results strictly use only the training data provided in LFW. The second class of results make implicit use of outside training data through trained facial feature detectors that are used to either align the images as in LFW-a or to determine where to extract features from in the image. The third class of results make explicit use of outside training data in the recognition system itself, beyond the alignment/feature extraction stage as in the second class.

Notes on the type of outside training data used for specific systems can be found in the list of methods at the bottom of the page. Details regarding training data falling under the second class are marked by sections beginning with a , and under the third class are marked by sections beginning with a .

Generating ROC Curves


The following script can be used to generate ROC curves using gnuplot: create_lfw_all_roc.p (only restricted / unrestricted / unsupervised).

The script takes in one text file for each method, containing on each line a point on the ROC curve, i.e. average true positive rate, followed by average false positive rate, separated by a single space. Additional methods can be added to the script by adding on to the plot command, e.g.
plot "nowak-original-roc.txt" using 2:1 with lines title "Nowak, original", \
     "nowak-funneled-roc.txt" using 2:1 with lines title "Nowak, funneled", \
     "new-method-roc.txt" using 2:1 with lines title "New Method"
Existing ROC files can be downloaded here:

Notes: gnuplot is multi-platform and freely distributed, and can be downloaded here. create_lfw_roc.p can either be run as a shell script on Unix/Linux machines (e.g. chmod u+x create_lfw_roc.p; ./create_lfw_roc.p) or loaded through gnuplot (e.g. at the gnuplot command line gnuplot> load "create_lfw_roc.p").

Methods

  1. Matthew A. Turk and Alex P. Pentland.
    Face Recognition Using Eigenfaces.
    Computer Vision and Pattern Recognition (CVPR), 1991.
    [pdf]

  2. Eric Nowak and Frederic Jurie.
    Learning visual similarity measures for comparing never seen objects.
    Computer Vision and Pattern Recognition (CVPR), 2007.
    [pdf]
    [webpage]

    Results were obtained using the binary available from the paper's webpage. View 1 of the database was used to compute the cut-off threshold used in computing mean classification accuracy on View 2. For each of the 10 folds of View 2 of the database, 9 of the sets were used as training, the similarity measures were computed for the held out test set, and the threshold value was used to classify pairs as matched or mismatched. This procedure was performed both on the original images as well as the set of aligned images from the funneled parallel database.

    We used the same parameters given on the paper's webpage, with C=1 for the SVM, specifically:

    pRazSimiERCF -verbose 2 -ntrees 5 -maxleavesnb 25000 -nppL 100000 -ncondtrial 1000 -nppT 1000 -wmin 15 -wmax 100 -neirelsize 1 -svmc 1

  3. Gary B. Huang, Vidit Jain, and Erik Learned-Miller.
    Unsupervised joint alignment of complex images.
    International Conference on Computer Vision (ICCV), 2007.
    [pdf]
    [webpage]

    Face images were aligned using publicly available source code from project webpage.

  4. Gary B. Huang, Michael J. Jones, and Erik Learned-Miller.
    LFW Results Using a Combined Nowak Plus MERL Recognizer.
    Faces in Real-Life Images Workshop in European Conference on Computer Vision (ECCV), 2008.
    [pdf]
    [commercial system, see note at top]

    Face images were aligned using a commercial system that attempts to identify nine facial landmark points through Viola-Jones type landmark detectors.

  5. Lior Wolf, Tal Hassner, and Yaniv Taigman.
    Descriptor Based Methods in the Wild.
    Faces in Real-Life Images Workshop in European Conference on Computer Vision (ECCV), 2008.
    [pdf]
    [webpage]

  6. Conrad Sanderson and Brian C. Lovell.
    Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference.
    International Conference on Biometrics (ICB), 2009.
    [pdf]

  7. Nicolas Pinto, James J. DiCarlo, and David D. Cox
    How far can you get with a modern face recognition test set using only simple features? Computer Vision and Pattern Recognition (CVPR), 2009.
    [pdf]

  8. Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid.
    Is that you? Metric Learning Approaches for Face Identification.
    International Conference on Computer Vision (ICCV), 2009.
    [pdf]
    [webpage]

    SIFT features were extracted at nine facial feature points using the detector of Everingham, Sivic, and Zisserman, 'Hello! My name is... Buffy' - automatic naming of characters in TV video, BMVC, 2006.

  9. Yaniv Taigman, Lior Wolf, and Tal Hassner.
    Multiple One-Shots for Utilizing Class Label Information.
    British Machine Vision Conference (BMVC), 2009.
    [pdf]
    [webpage]

    Used LFW-a, a version of LFW aligned using a commercial, fiducial-points based alignment system.

  10. Lior Wolf, Tal Hassner, and Yaniv Taigman.
    Similarity Scores based on Background Samples.
    Asian Conference on Computer Vision (ACCV), 2009.
    [pdf]

    Used LFW-a.

  11. Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar.
    Attribute and Simile Classifiers for Face Verification.
    International Conference on Computer Vision (ICCV), 2009.
    [pdf]
    [webpage]

    A commercial face detector - Omron, OKAO vision - was used to detect fiducial point locations. These locations were used to align the images and extract features from particular face regions.

    Attribute classifiers (e.g. Brown Hair) were trained using outside data and Amazon Mechanical Turk labelings, and simile classifiers (e.g. mouth similar to Angelina Jolie) were trained using images from PubFig. The outputs of these classifiers on LFW images were used as features in the recognition system.

    The computed attributes for all images in LFW can be obtained in this file: lfw_attributes.txt. The file format and meaning are described on this page, and further information on the attributes can be found on the project website.

  12. Javier Ruiz-del-Solar, Rodrigo Verschae, and Mauricio Correa.
    Recognition of Faces in Unconstrained Environments: A Comparative Study.
    EURASIP Journal on Advances in Signal Processing (Recent Advances in Biometric Systems: A Signal Processing Perspective), Vol. 2009, Article ID 184617, 19 pages.
    [pdf]

  13. Vinod Nair and Geoffrey E. Hinton.
    Rectified Linear Units Improve Restricted Boltzmann Machines.
    International Conference on Machine Learning (ICML), 2010.
    [pdf]

    Used Machine Perception Toolbox from MPLab, UCSD to detect eye location, manually corrected eye coordinates for worst ~2000 detections, used coordinates to rotate and scale images.

    Used face data outside of LFW for unsupervised feature learning.

  14. Zhimin Cao, Qi Yin, Xiaoou Tang, and Jian Sun.
    Face Recognition with Learning-based Descriptor.
    Computer Vision and Pattern Recognition (CVPR), 2010.
    [pdf]

    Landmarks are detected using the fiducial point detector of Liang, Xiao, Wen, Sun, Face Alignment via Component-based Discriminative Search, ECCV, 2008, which are then used to extract face component images for feature computation.

    The "+ comp" method uses a pose-adaptive approach, where LFW images are labeled as being frontal, left facing, or right facing, using three images selected from the Multi-PIE data set.

  15. Hieu V. Nguyen and Li Bai.
    Cosine Similarity Metric Learning for Face Verification.
    Asian Conference on Computer Vision (ACCV), 2010.
    [pdf]

    Used LFW-a.

  16. Nicolas Pinto and David Cox.
    Beyond Simple Features: A Large-Scale Feature Search Approach to Unconstrained Face Recognition.
    International Conference on Automatic Face and Gesture Recognition (FG), 2011.
    [pdf]

    Used LFW-a.

  17. Peng Li, Yun Fu, Umar Mohammed, James H. Elder, and Simon J.D. Prince.
    Probabilistic Models for Inference About Identity.
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 144-157, Jan. 2012.
    [pdf]
    [webpage]

  18. Qi Yin, Xiaoou Tang, and Jian Sun.
    An Associate-Predict Model for Face Recognition.
    Computer Vision and Pattern Recognition (CVPR), 2011.
    [pdf]

    Four landmarks are detected using a standard facial point detector and used to determine twelve facial components.

    The recognition system makes use of 200 identities from the Multi-PIE data set, covering 7 poses and 4 illumination conditions for each identity.

  19. Yaniv Taigman and Lior Wolf.
    Leveraging Billions of Faces to Overcome Performance Barriers in Unconstrained Face Recognition.
    ArXiv e-prints, 2011.
    [pdf]
    [webpage]
    [commercial system, see note at top]

    † ‡A commercial recognition system, making use of outside training data, is tested on LFW.

  20. Hae Jong Seo and Peyman Milanfar.
    Face Verification Using the LARK Representation.
    IEEE Transactions on Information Forensics and Security, 2011.
    [pdf]

    Used LFW-a.

  21. Yiming Ying and Peng Li.
    Distance Metric Learning with Eigenvalue Optimization.
    Journal of Machine Learning Research (Special Topics on Kernel and Metric Learning), 2012.
    [pdf]

    Used LFW-a, and features extracted from facial feature points of Guillaumin et al., 2009.

  22. Chang Huang, Shenghuo Zhu, and Kai Yu.
    Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval.
    NEC Technical Report TR115, 2011.
    [pdf]

    Used LFW-a.