bob.paper.wacv2024_dvpba
Mitigating Demographic Bias in Face Recognition via Regularized Score Calibration
This package contains source code of the training-regularization method and related experiments published in the following paper:
@inproceedings{kotwal_wacvw2024,
author = {Ketan Kotwal and Sebastien Marcel},
title = {Mitigating Demographic Bias in Face Recognition via Regularized Score Calibration},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops},
month = {January},
year = {2024}
}
If you use this package and/or its results, please consider citing the paper.
This package has been developed using the signal-processing and machine learning toolbox Bob and PyTorch Lightning.
Installation
The installation instructions are based on conda and works on Linux systems only. Install conda before continuing.
Download the source code of this paper, and create a conda environment with the following command:
$ cd bob.paper.wacv2024_dvpba
$ conda env create -f environment.yml
$ conda activate env_score_reg
$ pip install .
Downloading the Datasets and Face Recognition Models
To run the experiments, you will have to download the following datasets from their respective sources:
- VGGFace2
- MORPH
- RFW
The experiments described in the paper have considered three variants of iResNet architecture as the face recognition backbone: iResNet34, iResNet50, and iResNet100. We have used the models pretrained on the refined version of MSCeleb1M dataset (also known as MS1MV3) using the ArcFace loss. These models can be downloaded from the InsightFace repository.
The specifications of the protocol for VGGFace2 and MORPH datasets (that define
train and test partitions) can be downloaded from
here. For the
RFW dataset, the protocol has been released by the creators of the dataset. The
bob.bio.face
package consists of interfaces and protocols for all datasets.
Configuration of Datasets
After downloading the datasets, you need to set the paths of their locations in
the configuration file. Bob supports a configuration file (~/.bobrc
) in
your home directory to specify where the datasets are located. You may use the
following commands to set these paths:
# setup overall experiment directory (where you will save preprocessed data, features, etc.)
$ bob config set "score_reg_expt.directory" [PATH_TO_BASE_DIRECTORY]
# setup VGGFace2 directories
$ bob config set bob.db.vggface2.directory [YOUR_VGGFACE2_IMAGE_DIRECTORY]
$ bob config set bob.db.vggface2.annotation_directory [YOUR_VGGFACE2_ANNOTATION_DIRECTORY]
# setup MORPH directories
$ bob config set bob.db.morph.directory [YOUR_MORPH_IMAGE_DIRECTORY]
$ bob config set bob.db.morph.annotation_directory [YOUR_MORPH_ANNOTATION_DIRECTORY]
# setup RFW directories
$ bob config set bob.db.rfw.directory [YOUR_RFW_IMAGE_DIRECTORY]
$ bob config set bob.db.rfw.annotation_directory [YOUR_RFW_ANNOTATION_DIRECTORY]
Preprocessing Data
To avoid repeated preprocessing of data during training or validation, it is recommended to preprocess the data from each of the three datasets at once and reuse across experiments. The preprocessing consists of MTCNN-based face detector, and aligning the detected face using 5-keypoints (left eye, right eye, nose, left mouth, right mouth). These images are resized to 112 \times 112 dimensions.
The preprocessing can be performed using any of the following options:
- scripts provided by the MTCNN repository here using tensorflow.
- scripts provided by the Facenet-Pytorch repository here using pytorch.
- scripts provided by the Bob toolkit here. To use this code, you may have to create a different environment due to compatibility issues.
If you choose to work with preprocessed data, the location of preprocessed data
should be in the folder named DATASET_NAME
inside
"score_reg_expt.directory"
. Alternatively, if you prefer to save preprocessed
crops at other locations, set the DATABASE_PATH
variable in datasets/
files
of respective datasets.
Training (Finetuning) the Face Recognition Models
The training command requires the dataset, FR backbone, and training options as arguments.
python train/run_train.py
--models_directory [MODELS_DIRECTORY]
--fr_backbone_name [FR_BACKBONE_NAME]
--fr_backbone_weights [FR_BACKBONE_WEIGHTS]
--dataset [TRAIN_DATASET]
--epochs [EPOCHS]
--batch_size [BATCH_SIZE]
NOTE: Note that the training parameters: batch size, number of positive pairs, and number of negative pairs as required by the training script can be adjusted as per the hardware configuration. However, these parameters should not be reduced too much, as in such case, each mini-batch may not contain sufficient samples per demographic group to process. During training, if no pairwise genuine or imposter samples are found for a specific demographic group, the weights for corresponding calibration are not updated. Also, too few samples do not provide reliable estimates, and may lead to training collapse.
Running Inference on Regularized Face Recognition Models
The inference command requires the test dataset, regularized FR backbone, and
test (storage) options as arguments. This command uses pipeline
framework from
Bob. It uses dask internally to parallelize operations.
python eval/run_verify.py
--fr_backbone_name [FR_BACKBONE_NAME]
--fr_backbone_weights [FR_BACKBONE_WEIGHTS]
--dataset [TEST_DATASET]
--output_directory [OUTPUT_DIRECTORY]
--dask_client [DASK_CLIENT]
The inference commands generates a score file for each partition (eg. dev, test, etc). The scores are stored as a csv where each line refers to a probe sample. Each line has the following fields (common to all datasets): probe_subject_id, probe_subject, bio_ref_subject_id, bio_ref_sample, score, probe_demographic_label.
Evaluation of Experiments
To evaluate the score files run the following command.
bob bio metrics -v -e [PATH_TO_DEV_SCORE_FILE] [PATH_TO_TEST_SCORE_FILE]
For any experiment, the first argument (dev score file) should be the scores of the train partition of the dataset used to finetune the face recognition model. The second argument is the score file of the dataset and partition to be evaluated. The above command computes the score threshold on the dev scores-- which is based on the EER (Equal Error Rate). This threshold will then be used to compute the performance on the test set.
Contact
For questions or reporting issues to this software package, contact the first author (ketan.kotwal@idiap.ch) or our development mailing list.