Skip to content
Snippets Groups Projects
Commit dc73f8e9 authored by Laurent COLBOIS's avatar Laurent COLBOIS
Browse files

Readme updates

parent 1c3c35eb
No related branches found
No related tags found
No related merge requests found
......@@ -18,16 +18,31 @@
New package
=============
This package is part of the signal-processing and machine learning toolbox Bob_. It contains source code to reproduce experiments from the article
*On the use of automatically generated ijcb2021_synthetic_dataset image datasets for becnhmarking face recognition*.
This package is part of the signal-processing and machine learning toolbox Bob_. It contains source code to reproduce experiments published in the following article::
@INPROCEEDINGS{Colbois_IJCB_2021,
author = {Colbois, Laurent and de Freitas Pereira, Tiago and Marcel, S{\'{e}}bastien},
projects = {Idiap, Biometrics Center},
title = {On the use of automatically generated synthetic image datasets for benchmarking face recognition},
booktitle = {International Joint Conference on Biometrics (IJCB 2021)},
year = {2021},
note = {Accepted for Publication in IJCB2021},
abstract = {The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using
a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.},
pdf = {http://publications.idiap.ch/downloads/papers/2021/Colbois_IJCB_2021.pdf}
}
It mainly contains tools to perform the following operations:
1. Projection of a face dataset into StyleGAN2's latent space (`./bin/project_db.py`)
2. Computation of semantic editing latent directions from those projections (`./bin/latent_analysis.py`)
3. Generation of a synthetic dataset using the precomputed latent directions (`./bin/generate_db.py`)
4. Running a face recognition benchmark experiment on the synthetic dataset (`bob bio pipelines vanilla-biometrics`)
4. **Incoming at a later date** Running a face recognition benchmark experiment on the synthetic dataset (`bob bio pipelines vanilla-biometrics`)
If you use this package and/or its results, please consider citing the paper.
Installation
------------
......@@ -62,11 +77,12 @@ How to run
Download model dependencies
***************************
This project relies on several preexisting pretrained models:
The database generation in this project relies on several preexisting pretrained models:
* **DLIB Face Landmark detector** for cropping and aligning the projected faces exactly as in FFHQ. ([Example](http://dlib.net/face_landmark_detection.py.html))
* **StyleGAN2** as the main face synthesis network. ([Original paper](https://arxiv.org/abs/1912.04958), [Official repository](https://github.com/NVlabs/stylegan2)). We are using Config-F, trained on FFHQ at resolution 1024 x 1024
* A pretrained **VGG16** model, used to compute a perceptual loss between projected and target image ([Original paper](https://arxiv.org/abs/1801.03924))
* A pretrained face recognition network (Inception-Resnet v2 trained on MSCeleb), to compute the embedding distance between identities in order to apply the ICT constraint.
In order to download those models, one must specify the destination path in the `~/.bobrc` file, through the following commands:
::
......@@ -75,6 +91,7 @@ In order to download those models, one must specify the destination path in the
bob config set sg2_morph.dlib_lmd_path </path/to/dlib/landmark/detector.dat>
bob config set sg2_morph.sg2_path </path/to/stylegan2/pretrained/model.pkl>
bob config set sg2_morph.vgg16_path </path/to/vgg16/pretrained/model.pkl>
bob config set bob.bio.face_ongoing.models_path </path/to/fr/model/folder>
This should then enable to download the models once and for all by running
::
......@@ -88,15 +105,22 @@ Prepare folder configuration
# Absolute path of this repo, can be useful to launch execution on a grid due to some relative paths in the code
bob config set bob.paper.ijcb2021_synthetic_dataset.path <path_of_this_repo>
# Folder to store projected Multi-PIE latent projections
bob config set bob.synface.multipie_projections <path_to_folder>
# Paths to preexisting databases
# Folder containing Multi-PIE images
bob config set bob.db.multipie.directory <path_to_folder>
# Folder containing Multi-PIE face annotations
bob config set bob.db.multipie.annotations_directory <path_to_folder>
# Path to the Pickle file where to store computed latent directions
bob config set bob.synface.latent_directions <path_to_pickle_file.pkl>
# Folder containing FFHQ images
bob config set bob.db.ffhq.directory <path_to_folder>
# Paths to generated content
# Folder to store projected Multi-PIE latent projections
bob config set bob.synface.multipie_projections <path_to_folder>
# Path to the Pickle **file** where to store computed latent directions
bob config set bob.synface.latent_directions <path_to_pickle_file.pkl>
# Folder to store generated data
bob config set bob.synface.synthetic_datasets <path_to_folder>
Run dataset projection
**********************
......@@ -142,7 +166,7 @@ variations for each identity can be ran in parallel.
Due to it's small scale, the generation of Syn-Multi-PIE can be performed in a single GPU job:
::
jman submit -n synmultipie -q sgpu -s "PYTHONUNBUFFERED=1" -- ./bin/generate_db.py synmultipie -o <output_directory>
jman submit -n synmultipie -q sgpu -s "PYTHONUNBUFFERED=1" -- ./bin/generate_db.py synmultipie
However, should one want to scale the size of the database, it is possible to split the generation in two passes. The first part is sequential, and generates all references.
The second part can be parallelized and consists in augmenting each generated reference with its variations of interest.
......@@ -156,6 +180,11 @@ Here is an example when scaling synmultipie to 10k identities:
This first launches a single job generating all references. One this job finishes, 8 parallel jobs between which
the identities are split to generate all variations.
Run benchmark experiments
*************************
Upcoming
......@@ -163,10 +192,10 @@ Contact
-------
For questions or reporting issues to this software package, contact our
development `mailing list`_.
development team by asking your question on `stackoverflow`_ and with the tag *python-bob*.
.. Place your references here:
.. _bob: https://www.idiap.ch/software/bob
.. _installation: https://www.idiap.ch/software/bob/install
.. _mailing list: https://www.idiap.ch/software/bob/discuss
\ No newline at end of file
.. _stackoverflow: https://stackoverflow.com/questions/tagged/python-bob
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment