Skip to content
Snippets Groups Projects

Improving Generalization of Deepfake Detection by Training for Attribution

This repository contains the source code and the score files for reproducing the results reported in the paper "Improving Generalization of Deepfake Detection by Training for Attribution" which has been published at MMSP 2021. Here is Bibtex entry for the paper:

@inproceedings{JainMMSP2021,
    author = {Anubhav Jain and Pavel Korshunov and S\'ebastien Marcel},
    title = {Improving Generalization of Deepfake Detection by Training for Attribution},
    booktitle = {IEEE MMSP},
    year = 2021,
    month = oct,
    address = {Tampere, Finland},
}

If you use this package and/or its results, please cite the paper.

Getting Started

Follow the following commands for clone the repository and creating the conda environment for running the code:

$ git clone https://gitlab.idiap.ch/bob/bob.paper.deepfake_attribution
$ cd bob.paper.deepfake_attribution
$ conda env create -f environment.yml
$ conda activate deepfake_attribution

Generating TfRecords:

Firstly you need to set the path to the datasets using bob config for all the available datasets. You need to run the following command to set the paths:

$ bob config set bob.db.<dbname>.directory /path/to/<dbname>/dataset

You need to run the command for each of the available datasets, using the following dbnames: celebdf, faceforensics, mobio, google, deepfakeTIMIT.

You would also need to set the path to the repository using bob config. You can do this using the following command:

$ bob config set repository.path /path/to/the/repo

We use Tfrecords for our experiments. You can either create Tfrecords containing binary labels or multi-class labels for training the attribution models. To create Tfrecords of the CelebDF, FaceForensics, Google, DeepfakeTIMIT and DF-Mobio datasets run the following command:

$ jgen bob/paper/deepfake_attribution/config/deepfake.yml bob/paper/deepfake_attribution/database/database.py 'bob/paper/deepfake_attribution/config/tfrecords_{{ dbname[0] }}.py'
$ python3 bob/paper/deepfake_attribution/script/db_to_tfrecord_bob.py -v /path/to/generated/python/script/created/in/previous/step -o  /path/to/tfrecords/directory -t multi/or/binary -s train/or/dev/or/eval

The first command will create python scripts which is used as an input for the second command.

Also please note that the dbname should always match with the database name in the lists subdirectory. The available options are - celebdf, mobio, google, faceforensics, deepfakeTIMIT.

Training the models

You can either train the models from scratch or use pretrained weights to simply test these models. The training scripts use the Tfrecords created in the previous step and store the validation score files and the trained models. If you use pretrained models instead of training from scratch you can get the validation score files using the test scripts.

You can training the "Binary" model, using the following command:

$ python bob/paper/deepfake_attribution/script/Binary/train.py --model_dir /path/to/directory/to/save/models --checkpoint_dir /path/to/directory/to/save/model/checkpoints --save_dir /path/to/directory/to/save/scores --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient --training_datasets <Space separated list of datasets to used for training>

To training the "Attribution" model, use the following command:

$ python bob/paper/deepfake_attribution/script/Attribution/train.py --model_dir /path/to/directory/to/save/models --checkpoint_dir /path/to/directory/to/save/model/checkpoints --save_dir /path/to/directory/to/save/scores --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient --training_datasets <Space separated list of datasets to used for training>

To training the "Triplet-Loss" model, use the following command:

$ python bob/paper/deepfake_attribution/script/TripletLoss/train.py --model_dir /path/to/directory/to/save/models --checkpoint_dir /path/to/directory/to/save/model/checkpoints --save_dir /path/to/directory/to/save/scores --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient --training_datasets <Space separated list of datasets to used for training>

To replicate the base experiments in the paper, the list of training datasets needs to be:: celebdf FF-deepfakes FF-neuraltextures FF-face2face FF-faceswap FF-faceshifter FF-youtube

Pre-trained models

The pre-trained models are located at this link . Run:

$ mkdir bob/paper/deepfake_attribution/Models

and place the pre-trained models inside this directory. Note each model contains a .json and .h5 weight file. You need to have both of them.

Evaluating the models

The test scripts use the Tfrecords created in the 1st step and reads the models stored during training or pretrained models. Please note that you require the tfrecords with binary labels to run this script. --val parameter is only needed to be set true when using pretrained weights to save the validation scores.

To test the "Binary" model, use the following command:

$ python bob/paper/deepfake_attribution/script/Binary/test.py --test_db db_to_test --model_path /path/to/directory/containing/models --base_path /path/to/store/outputs --val True --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient

To test the "Attribution" model, use the following command:

$ python bob/paper/deepfake_attribution/script/Attribution/test.py --test_db db_to_test --model_path /path/to/directory/containing/models --base_path /path/to/store/outputs --val True --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient

To test the "Triplet-Loss" model, use the following command:

$ python bob/paper/deepfake_attribution/script/TripletLoss/test.py --test_db db_to_test --model_path /path/to/directory/containing/models --base_path /path/to/store/outputs --val True --tfrecords_dir /path/to/tfrecords/directory --model_name Xception_or_Efficient

Evaluating the results

Either you generate your own score files or we provide many of them inside ./scores directory. In the same directory, you can find several Jupyter notebooks that allow to evaluate the score files. The notebooks in the form of compute_results_<Model_name>_<# of datasets used for training>.ipynb can be used to to evaluate the results of the Ablation study.

The results reported in the paper "Improving Generalization of Deepfake Detection by Training for Attribution" published in MMSP 20201 can be re-computed using the compute_results_Xception_all.ipynb and compute_results_Efficient_all.ipynb notebooks. The first notebook will recompute the results reported in Table III and Table IV of the paper and the second notebook will recompute the results of Table V and Table VI.

Contact

For questions or reporting issues to this software package, contact our development mailing list or directly contact Pavel Korshunov (pavel.korshunov@idiap.ch).