Skip to content
Snippets Groups Projects
user avatar
parzul authored
f4667e77
History

New package

This package is part of the signal-processing and machine learning toolbox Bob. It contains source code to reproduce experiments published in the following article:

@INPROCEEDINGS{Colbois_IJCB_2021,
         author = {Colbois, Laurent and de Freitas Pereira, Tiago and Marcel, S{\'{e}}bastien},
      projects = {Idiap, Biometrics Center},
         title = {On the use of automatically generated synthetic image datasets for benchmarking face recognition},
      booktitle = {International Joint Conference on Biometrics (IJCB 2021)},
         year = {2021},
         note = {Accepted for Publication in IJCB2021},
      abstract = {The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using
a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.},
            pdf = {http://publications.idiap.ch/downloads/papers/2021/Colbois_IJCB_2021.pdf}
}

It mainly contains tools to perform the following operations:

  1. Projection of a face dataset into StyleGAN2's latent space (./bin/project_db.py)
  2. Computation of semantic editing latent directions from those projections (./bin/latent_analysis.py)
  3. Generation of a synthetic dataset using the precomputed latent directions (./bin/generate_db.py)
  4. Incoming at a later date Running a face recognition benchmark experiment on the synthetic dataset (bob bio pipelines vanilla-biometrics)

If you use this package and/or its results, please consider citing the paper.

Installation

This project contains two distinct conda environments:

  • generation_env.yml This environment is based on Bob 8 and Tensorflow 1, and is used for step 1 to 3 (dataset projection, latent analysis and database generation)
  • benchmark_env.yml This environment is based on Bob 9 and Tensorflow 2, and is used for step 4 (running the benchmark experiments).

To install everything correctly, after pulling this repository from Gitlab, you need to

1. Install both environments

conda env create -f generation_env.yml
conda env create -f benchmark_env.yml

1. Run buildout to extend the generation environment with the tools available in this repository

conda activate synface # Activate the generation env.
buildout -c buildout.cfg # Run buildout

This second step creates a bin folder containing in particular

  1. ./bin/python Custom Python executable containing the generation env. extended with bob.paper.ijcb2021_synthetic_dataset
  2. ./bin/project_db.py Dataset projection script (entry point)
  3. ./bin/latent_analysis.py Script for computing latent directions (entry point)
  4. ./bin/generate_db.py Synthetic dataset generation script (entry point)
  5. ./bin/download_models.py Utilitary to download required pretrained models (entry point)

How to run

Download model dependencies

The database generation in this project relies on several preexisting pretrained models:

In order to download those models, one must specify the destination path in the ~/.bobrc file, through the following commands:

conda activate synface
bob config set sg2_morph.dlib_lmd_path </path/to/dlib/landmark/detector.dat>
bob config set sg2_morph.sg2_path </path/to/stylegan2/pretrained/model.pkl>
bob config set sg2_morph.vgg16_path </path/to/vgg16/pretrained/model.pkl>
bob config set bob.bio.face_ongoing.models_path </path/to/fr/model/folder>

This should then enable to download the models once and for all by running

./bin/download_models.py

Download database dependencies

In order to compute latent directions by projection, you need to download the [Multi-PIE dataset](http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html) in the location of your choice, then prepare the folder configuration as explained in the following section.

Prepare folder configuration

You need to configure a few important paths that are used in the code to read input data or store results.

# Absolute path of this repo, can be useful to launch execution on a grid due to some relative paths in the code
bob config set bob.paper.ijcb2021_synthetic_dataset.path <path_of_this_repo>

# Paths to preexisting databases
# Folder containing Multi-PIE images
bob config set bob.db.multipie.directory <path_to_folder> (*data* folder in the downloaded database).
# Folder containing FFHQ images
bob config set bob.db.ffhq.directory <path_to_folder>


# Paths to generated content
# Folder to store projected Multi-PIE latent projections
bob config set bob.synface.multipie_projections <path_to_folder>
# Path to the Pickle **file** where to store computed latent directions
bob config set bob.synface.latent_directions <path_to_pickle_file.pkl>
# Folder to store generated data
bob config set bob.synface.synthetic_datasets <path_to_folder>
# Folder to score biometric evaluation results
bob config set bob.synface.scores <path_to_folder>

Run dataset projection

Database projection is performed by running the ./bin/project_db.py script. Premade configuration files are available in the repository to perform the projection of three subsets of interest of the Multipie database (image from the world group for the U, E and P protocol). StyleGAN2 only runs on GPU, therefore the projection should be made on a computation node with a GPU. Moreover, on the Idiap SGE grid, one can easily split the projection process between several parallel jobs by submitting them through jman. By default (unless one uses the --force flag), images are not reprojected if their latent projection is already present in the output folder, which enables to interrupt & restart projection in a series of sequential sgpu jobs. One can run ./bin/project_db.py --help to get more info on the required configuration values.

Example commands:

jman submit -n multipie_proj -q sgpu -t 8 -r 10 -s "PYTHONUNBUFFERED=1" -- ./bin/project_db.py multipie_P --checkpoint

This commands launch 8 parallel projection jobs of the Multipie P protocol on the short gpu queue. Replace multipie_P by multipie_E and multipie_E for other protocols. As the command uses the --checkpoint flag, not only latents projection will be saved, but also the cropped images as well as the resynthetized StyleGAN2 images for each computed latent. This launch instruction is then repeated 10 times, until there are no more images left to project. Depending on the GPU, projection of a single image can take more or less time, so one might optimally need a different number of repetitions.

Run latent analysis

After having computed the latent projections for multipie_U, multipie_E and multipie_P, one can run the following script to compute the associated latent directions found by fitting SVMs in the latent space:

./bin/latent_analysis.py --seed 0

The results will be stored in the Pickle file pointed by the bob.synface.latent_directions entry of the .bobrc configuration file. In case you don't want to regenerate yourself those latent directions, we provide them already in this repo under the path precomputed/latent_directions.pkl.

Generate a synthetic database

Finally, the script ./bin/generate_db.py can be used to generate synthetic databases with semantic face variations, using the computed latent directions. Use ./bin/generate_db.py --help to get more info on the required configuration values. We provide 3 preset configurations:

  • synmultipie which reproduces the Syn-Multi-PIE dataset used for benchmark experiments in the article
  • uniqueness_ict & uniqueness_no_ict which reproduce the datasets containing only references that are used for the identity uniqueness experiment.

This script is also using StyleGAN2 and thus should be ran on a node with GPU access. The generation of the references cannot be ran in parallel (due to the application of the ICT constraint which requires a comparison between every reference) but the creation of the variations for each identity can be ran in parallel.

Due to it's small scale, the generation of Syn-Multi-PIE can be performed in a single GPU job:

jman submit -n synmultipie -q sgpu -s "PYTHONUNBUFFERED=1" -- ./bin/generate_db.py synmultipie

However, should one want to scale the size of the database, it is possible to split the generation in two passes. The first part is sequential, and generates all references. The second part can be parallelized and consists in augmenting each generated reference with its variations of interest. Here is an example when scaling synmultipie to 10k identities:

jman submit -n make_references -q gpu -o -s "PYTHONUNBUFFERED=1" -- ./bin/generate_db.py synmultipie -n 10000 --subtask create-identities  > create_identities_job_id
dependency=$(cat create_identities_job_id)
jman submit -n make_variations -q gpu -t 8 -x $dependency -s "PYTHONUNBUFFERED=1" -- ./bin/generate_db.py synmultipie -n 10000 --subtask populate-identities

This first launches a single job generating all references. One this job finishes, 8 parallel jobs between which the identities are split to generate all variations.

Run benchmark experiments

Upcoming

Contact

For questions or reporting issues to this software package, contact our development team by asking your question on stackoverflow and with the tag python-bob. You can also contact the first author.