diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..dbb85adcaf367d41ac131cc01e670a38ebb9009d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+logs/
+*.pyc
\ No newline at end of file
diff --git a/README.md b/README.md
index 9e9f6588d1dad34f092dc658c94812bacd816cb5..93592ff17fa2285560751838e6e561c2ee221b6d 100644
--- a/README.md
+++ b/README.md
@@ -1,92 +1,17 @@
-# bob.paper.ijcb2023_vuln_analysis_hyg_mask_attack
+# Can personalised hygienic masks be used to attack face recognition systems?
 
+This repository contains the code for the paper "Can personalised hygienic masks be used to attack face recognition systems?".
 
+This repository is organised as follows:
 
-## Getting started
+- `database/`: contains the codes used to organize the database in pandas dataframes. The dataframes will be then used by the pipeline to run the full experiment
+- `preprocessor/`: contains the codes to extract the frames from the videos and the code for face cropping.
+- `pipeline_vuln.py`: contains the codes to run the pipeline.
 
-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+## How to run the pipeline
 
-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
-
-## Add your files
-
-- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
-- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
-
-```
-cd existing_repo
-git remote add origin https://gitlab.idiap.ch/bob/bob.paper.ijcb2023_vuln_analysis_hyg_mask_attack.git
-git branch -M master
-git push -uf origin master
-```
-
-## Integrate with your tools
-
-- [ ] [Set up project integrations](https://gitlab.idiap.ch/bob/bob.paper.ijcb2023_vuln_analysis_hyg_mask_attack/-/settings/integrations)
-
-## Collaborate with your team
-
-- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
-- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
-- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
-- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
-- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
-
-## Test and Deploy
-
-Use the built-in continuous integration in GitLab.
-
-- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
-- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
-- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
-- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
-- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
-
-***
-
-# Editing this README
-
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
-
-## Suggestions for a good README
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
-
-## Name
-Choose a self-explaining name for your project.
-
-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
-
-## Badges
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
-
-## Visuals
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
-
-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
-
-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
-
-## Support
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
-
-## Roadmap
-If you have ideas for releases in the future, it is a good idea to list them in the README.
-
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
-
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
-
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
-
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
-
-## License
-For open source projects, say how it is licensed.
-
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
+1. Download the database from the following link: https://www.idiap.ch/en/dataset/phymatt
+2. Create a list of the videos to be used for the experiment. The list should contain the path for each video you want in the experiment. If you want all videos, you can use the following command: `find <path_to_database> -name "*.mp4" > <path_to_list>`
+3. Run the frames extraction code as follwos: `python preprocessor/extract_frames.py -l <path_to_list> -o <path_to_output_folder>`
+4. Run the database organization code as follows: `python database/create_database_dataframe.py --frames_list --output_path  -metadata_filename -save_mode --min_face_size`
+5. Run the pipeline as follows: `python pipeline_vuln.py --database_path --output_path --metadata_filename --save_mode --min_face_size --attack_type --attack_params --attac`
diff --git a/__init__.py b/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/database/__init__.py b/database/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/database/create_database_dataframe.py b/database/create_database_dataframe.py
new file mode 100644
index 0000000000000000000000000000000000000000..52d3a21dfb73ca28ddb2a64bbab95adb72cdb857
--- /dev/null
+++ b/database/create_database_dataframe.py
@@ -0,0 +1,119 @@
+## Sun's Grid Engine parameters
+# ... command
+#$ -S /idiap/temp/akomaty/conda/envs/bob_face/bin/python
+# ... queue
+#$ -l q_1day_mth -pe pe_mth 8
+# ... Altneratively you can specify the queue with the hostname: -l pytorch,sgpu,'hostname=vgn[e]*'
+# ... job name
+#$ -N face_detection
+# ... project name (use qconf -sprjl to watch project list)
+#$ -P soteria
+# ... Location for output and error files
+#$ -o /idiap/temp/akomaty/log/
+#$ -e /idiap/temp/akomaty/log/
+# ... Add current directory to include other python packages from the same directory
+#$ -cwd
+
+"""
+This code will create a pandas dataframe containing metadata about the SOTERIA database.
+Usage:
+
+$ python create_database_dataframe.py
+
+To read the pickled file use the following:
+>> import pandas as pd
+>> df = pd.read_pickle("file.pkl")
+"""
+import os
+import argparse
+import pandas as pd
+from bob.bio.face.annotator import MTCNN
+from bob.io.base import load
+
+SCENARIOS=["normal_light", "low_light", "angle_var_right_left", "angle_var_up_down"]
+
+def detect_faces(frame, min_face_size):
+    """_summary_
+    This function detects faces in frames of videos, it fills them into the list 'nb_detected_faces'.
+    """
+    annot = MTCNN(min_size=int(min_face_size), factor=0.709, thresholds=(0.1, 0.2, 0.2))
+    boxes, prob, landmarks = annot.detect(frame)
+    nb_detected_faces = len(prob)
+    probs = prob.numpy()
+    boxes = boxes.numpy()
+    landmarks = landmarks.numpy()
+    return nb_detected_faces, probs, boxes, landmarks
+
+def create_metadata_file(frames_list, output_path, metadata_filename, save_mode, min_face_size):
+    my_file = open(frames_list, "r")
+    metadata = []
+
+    for line in my_file.readlines():
+        path = os.path.split(line)[0]
+        filename = os.path.split(line)[-1].strip()
+        filename_split = filename.split("_")
+        subject_id = filename_split[0]
+        recording_device = filename_split[1]
+        camera = filename_split[2]
+        attack_type = filename_split[3]
+        scenario = filename_split[4]
+        replay_device = filename_split[5]
+        source_video = filename_split[6]
+        session = filename_split[7]
+        frame_nb = filename_split[-1].split(".")[0].split("-")[-1]
+        reference_id = '_'.join((filename_split[0])) # Should be replaced by template_id
+        frame = load(os.path.join(path, filename))
+        nb_detected_faces, probs, boxes, landmarks = detect_faces(frame, min_face_size)
+        
+        metadata.append(
+                {
+                    'path': os.path.join(path, filename),
+                    'key': os.path.join(subject_id, filename), # the key is used to save files when checkpointing is set to True
+                    'filename':  filename,
+                    'subject_id': int(subject_id),
+                    'reference_id': reference_id, # Should be replaced by template_id
+                    'camera': camera,
+                    'recording_device': recording_device,
+                    'replay_device': replay_device,
+                    'source_video': source_video,
+                    'scenario': scenario,
+                    'session': session,
+                    'attack_type': attack_type,
+                    'frame_nb': frame_nb,
+                    'no_faces': nb_detected_faces,
+                    'probs_faces': probs,
+                    'boxes': boxes,
+                    'landmarks': landmarks,
+                }
+        )
+        if save_mode == 'frame':
+            df_frame = pd.DataFrame(metadata[-1])
+            metadata_filename = filename.split('.')[0]
+            # Save the dataframe into pickle
+            df.to_pickle(os.path.join(os.path.join(output_path, "frames"), '.'.join((metadata_filename, 'pkl'))))
+
+    my_file.close()
+    # Put your list in a dataframe
+    df = pd.DataFrame(metadata)
+    
+    if save_mode == 'batch':
+        batch_idx = frames_list.split('/')[-1].split('.')[0].split('_')[-1]
+        metadata_filename = metadata_filename + '_'+ str(batch_idx)
+        # Save the dataframe into pickle
+        df.to_pickle(os.path.join(output_path, '.'.join((metadata_filename, 'pkl'))))
+
+
+if __name__ == '__main__':
+    # Create an ArgumentParser object
+    parser = argparse.ArgumentParser()
+
+    # Add an argument with a default value
+    parser.add_argument('-f', '--frames_list', default='/idiap/user/akomaty/projects/soteria/scripts/data/all_print_attack_frames.txt', help='A path to a file containing the list of frames.')
+    parser.add_argument( '-o' ,'--output_path', default='/idiap/project/soteria/soteria_database_frames', help='The output path where you want to save your pickled metadata.')
+    parser.add_argument('-m' ,'--metadata_filename', default='metadata_df_frames',help='name of the pickled metadata file.')
+    parser.add_argument('-s' ,'--save_mode', default='batch', choices=['batch', 'frame'], help='Choose if you want to save one pickled file per list of frames ("batch"), or you want to save one pickled file per frame ("frame").')
+    parser.add_argument('-z' ,'--min_face_size', default=40, help=' Minimum face size to be detected.')
+
+    # Parse the arguments
+    args = parser.parse_args()
+    create_metadata_file(args.frames_list, args.output_path, args.metadata_filename, args.save_mode, args.min_face_size)
diff --git a/database/soteria_db.py b/database/soteria_db.py
new file mode 100644
index 0000000000000000000000000000000000000000..1843fdd807e98cb746f1afdb0d3d093934acd571
--- /dev/null
+++ b/database/soteria_db.py
@@ -0,0 +1,105 @@
+from functools import partial
+
+from bob.pipelines import DelayedSample, SampleSet
+from bob.bio.base.pipelines.abstract_classes import Database
+import bob.io.base
+
+ROOT = "/idiap/project/soteria/soteria_database_frames/bonafide/frames"
+
+
+class SOTERIADatabase(Database):
+    
+    def __init__(self,
+                 references_eval,
+                 references_dev,
+                 probes_eval,
+                 probes_dev,
+                 matching_eval,
+                 matching_dev,
+                 original_directory=ROOT,
+                ):
+        
+        self.original_directory = original_directory
+        self.image_directory = self.original_directory
+        self.references_df_eval = references_eval
+        self.references_df_dev = references_dev
+        self.probes_df_eval = probes_eval
+        self.probes_df_dev = probes_dev
+        self.matching_eval = matching_eval
+        self.matching_dev = matching_dev
+        super().__init__(
+            name='soteria',
+            protocol='default',
+            annotation_type="eyes-center",
+            score_all_vs_all=False)
+        
+    def background_model_samples(self):
+        return []
+    
+    def references(self, group):
+        if group == 'eval':
+            return make_samplesets(self.references_df_eval, self.image_directory)
+        elif group == 'dev':
+            return make_samplesets(self.references_df_dev, self.image_directory)
+        else:
+            raise ValueError('Unknown group! groups should be "dev" or "eval!"')
+    
+    def probes(self, group):
+        if group == 'eval':
+            probes = make_samplesets(self.probes_df_eval, self.image_directory)
+            
+            # Reorganize list of matching to regroup them by probe template id
+            indexed_matching = self.matching_eval.groupby('probe_reference_id')
+            
+            for sampleset in probes:
+                # Get the list of references to which this probe template should be compared
+                compared_references = indexed_matching.get_group(sampleset.reference_id)
+                
+                # Add the list of reference template id under the `references` field of the sampleset
+                sampleset.references= compared_references['bio_ref_reference_id'].tolist()
+            return probes
+
+        elif group == 'dev':
+            probes = make_samplesets(self.probes_df_dev, self.image_directory)
+            
+            # Reorganize list of matching to regroup them by probe template id
+            indexed_matching = self.matching_dev.groupby('probe_reference_id')
+            
+            for sampleset in probes:
+                # Get the list of references to which this probe template should be compared
+                compared_references = indexed_matching.get_group(sampleset.reference_id)
+                
+                # Add the list of reference template id under the `references` field of the sampleset
+                sampleset.references= compared_references['bio_ref_reference_id'].tolist()
+            return probes
+    
+    def all_samples(self, group):
+        ## NB : this will contain duplicates as some samples appear both as probes and references
+        return self.probes(group) + self.references(group)
+    
+    
+    def groups(self):
+        return list(("dev", "eval"))
+    
+    def protocols(self):
+        return ['default']
+
+# Utilitary function to turn 1 dataframe row into a DelayedSample
+def make_sample_from_row(row, image_dir):
+    return DelayedSample(load=partial(bob.io.base.load, inputs=row['path']), **dict(row))
+
+# Utilitary function to turn a Dataframe into a list of SampleSets (= a list of biometric templates)
+def make_samplesets(df, image_dir):
+    # We group the samples by reference to form SampleSets
+    templates = df.groupby('reference_id')
+    
+    samplesets  = []
+    for reference_id, samples in templates:
+        subject_id = samples.iloc[0]['subject_id']
+        attack_type = samples.iloc[0]['attack_type']
+        if attack_type is None:
+            attack_type = ''
+        samplesets.append(SampleSet(samples.apply(make_sample_from_row, axis=1, image_dir=image_dir).tolist(), reference_id=reference_id, subject_id=subject_id, key=reference_id, attack_type=attack_type))
+        
+    return samplesets
+
diff --git a/pipeline_vuln.py b/pipeline_vuln.py
new file mode 100644
index 0000000000000000000000000000000000000000..662375bc2f3419d6bab7923c48509fe298b35369
--- /dev/null
+++ b/pipeline_vuln.py
@@ -0,0 +1,193 @@
+import pandas as pd
+import itertools, argparse, os, sys
+
+import bob.io.base
+
+from bob.pipelines import SampleWrapper
+from preprocessor.cropper_custom import FivePointsNormalizer, FaceNormalizer
+from bob.bio.face.embeddings.pytorch import IResnet50
+from sklearn.pipeline import Pipeline
+
+from bob.bio.base.pipelines import PipelineSimple
+from bob.bio.base.algorithm import Distance
+
+from utils import _utils
+from utils.load_annotations import load_annotations
+
+def run_pipeline(ROOT, bonafide_annotations_path, print_path, replay_path, hyg_attack_path, OUTPUT_PATH):
+
+    # Create a dataframe containing the bonafide annotations
+    df = load_annotations(bonafide_annotations_path)
+
+    # Create a references dataframe containing only files from the first session, the main camera only, and the first scenario: indoor normal light
+    session_id = "s1"
+    scene_id = 0
+    nb_of_frames = 4 # the number of frames to be selected
+
+    protocol_references_dev = '((session==@session_id) & (frame_nb < @nb_of_frames) & (scenario_id == @scene_id) & (with_mask==False) &(subject_id <= 70))'
+    protocol_references_eval = '((session==@session_id) & (frame_nb < @nb_of_frames) & (scenario_id == @scene_id) & (with_mask==False) &(subject_id > 69))'
+
+    ## Dataframes for references, dev and eval
+    references_df_dev=df.query(protocol_references_dev)
+    references_df_eval=df.query(protocol_references_eval)
+
+    # Create a probes dataframe containing only files from the second session,  the main camera only, and the first scenario: indoor normal light
+    session_id = "s2"
+
+    protocol_probes_dev = '((session==@session_id) & (frame_nb < @nb_of_frames) & (scenario_id == @scene_id) & (with_mask==False) & (subject_id <= 35))'
+    protocol_probes_eval = '((session==@session_id) & (frame_nb < @nb_of_frames) & (scenario_id == @scene_id) & (with_mask==False) & (subject_id > 35))'
+
+    probes_df_dev=df.query(protocol_probes_dev)
+    probes_df_eval=df.query(protocol_probes_eval)
+
+    probes_df_dev['reference_id'] = probes_df_dev['filename'].values
+    probes_df_eval['reference_id'] = probes_df_eval['filename'].values
+
+    probes_df_dev_bonafide = probes_df_dev
+    probes_df_eval_bonafide = probes_df_eval
+
+    ###======================================================
+    # Add the print attack frames to the probes dataframe
+    df_print = load_annotations(print_path)
+
+    protocol_probes_dev_print = '((frame_nb < @nb_of_frames) & (subject_id <= 70))'
+    protocol_probes_eval_print = '((frame_nb < @nb_of_frames) & (subject_id > 69))'
+
+    probes_df_dev_print=df_print.query(protocol_probes_dev_print)
+    probes_df_eval_print=df_print.query(protocol_probes_eval_print)
+
+    probes_df_dev_print['reference_id'] = probes_df_dev_print['filename'].values
+    probes_df_eval_print['reference_id'] = probes_df_eval_print['filename'].values
+
+    # Concatenate the print probe frames with the bonafide 
+    fr = [probes_df_dev, probes_df_dev_print]
+    probes_df_dev = pd.concat(fr)
+    fr = [probes_df_eval, probes_df_eval_print]
+    probes_df_eval = pd.concat(fr)
+
+    ###======================================================
+    # Add the replay attack frames to the probes dataframe
+    df_replay = load_annotations(replay_path)
+
+    rep_att_scene = '1'
+
+    protocol_probes_dev_replay = '((frame_nb < @nb_of_frames) & (rep_att_scenario == @rep_att_scene) & (subject_id <= 70))'
+    protocol_probes_eval_replay = '((frame_nb < @nb_of_frames) & (rep_att_scenario == @rep_att_scene) & (subject_id > 69))'
+
+    probes_df_dev_replay=df_replay.query(protocol_probes_dev_replay)
+    probes_df_eval_replay=df_replay.query(protocol_probes_eval_replay)
+
+    probes_df_dev_replay['reference_id'] = probes_df_dev_replay['filename'].values
+    probes_df_eval_replay['reference_id'] = probes_df_eval_replay['filename'].values
+
+    # Concatenate the replay probe frames with the bonafide
+    fr = [probes_df_dev, probes_df_dev_replay]
+    probes_df_dev = pd.concat(fr)
+    fr = [probes_df_eval, probes_df_eval_replay]
+    probes_df_eval = pd.concat(fr)
+
+    # ###======================================================
+    # # Add the hygienic mask attack frames to the probes dataframe
+    # frames_hyg = [ _utils.process_df(os.path.join(hyg_attack_path, f)) for f in os.listdir(hyg_attack_path)]
+    # df_hyg = pd.concat(frames_hyg)
+    # df_hyg['frame_nb'] = df_hyg['frame_nb'].astype(int)
+
+    # protocol_probes_dev_hyg = '(subject_id <= 70)'
+    # protocol_probes_eval_hyg = '(subject_id > 69)'
+
+    # probes_df_dev_hyg=df_hyg.query(protocol_probes_dev_hyg)
+    # probes_df_eval_hyg=df_hyg.query(protocol_probes_eval_hyg)
+
+    # probes_df_dev_hyg['reference_id'] = probes_df_dev_hyg['filename'].values
+    # probes_df_eval_hyg['reference_id'] = probes_df_eval_hyg['filename'].values
+
+    # # Concatenate the hygienic mask attacks probe frames with the bonafide 
+    # fr = [probes_df_dev, probes_df_dev_hyg]
+    # probes_df_dev = pd.concat(fr)
+    # fr = [probes_df_eval, probes_df_eval_hyg]
+    # probes_df_eval = pd.concat(fr)
+
+    ###======================================================
+    # Create matching lists for the probes and references
+    # Get the sets of all reference ids in both the references (templates) and the probes
+    list_references_eval = set(references_df_eval["reference_id"].tolist())
+    list_probes_eval = set(probes_df_eval_bonafide["filename"].tolist())
+    matching_eval = list(itertools.product(list_probes_eval, list_references_eval))
+    matching_eval = pd.DataFrame(matching_eval)
+    matching_eval.columns = ["probe_reference_id", "bio_ref_reference_id"]
+
+    # Get the sets of all reference ids in both the references (templates) and the probes
+    list_references_dev = set(references_df_dev["reference_id"].tolist())
+    list_probes_dev = set(probes_df_dev_bonafide["filename"].tolist())
+    matching_dev = list(itertools.product(list_probes_dev, list_references_dev))
+    matching_dev = pd.DataFrame(matching_dev)
+    matching_dev.columns = ["probe_reference_id", "bio_ref_reference_id"]
+
+    # Create matchings for attacks
+    matching_eval_replay = _utils.create_matchings_for_attacks(probes_df_eval_replay)
+    matching_eval_print = _utils.create_matchings_for_attacks(probes_df_eval_print)
+
+    matching_dev_replay = _utils.create_matchings_for_attacks(probes_df_dev_replay)
+    matching_dev_print = _utils.create_matchings_for_attacks(probes_df_dev_print)
+
+    matching_eval = pd.concat([matching_eval, matching_eval_replay, matching_eval_print])
+    matching_dev =  pd.concat([matching_dev, matching_dev_replay, matching_dev_print])
+
+
+    def iresnet_template():
+        # DEFINE CROPPING
+        cropped_image_size = (112, 112)
+
+        normalizer = FivePointsNormalizer(cropped_image_size = cropped_image_size, reference_points = 'arcface')
+        cropper = FaceNormalizer(normalizer)
+        cropper = SampleWrapper(cropper, transform_extra_arguments=[('annotations', 'annotations')])
+        
+        embedding = IResnet50()
+        embedding = SampleWrapper(embedding)
+        
+        pipeline = Pipeline([('cropper', cropper), ('embedding', embedding)])
+        algorithm = Distance()
+
+        return PipelineSimple(pipeline, algorithm)
+
+    from database.soteria_db import SOTERIADatabase
+    # Create an instance of the class SOTERIADatabase
+    soteria_db = SOTERIADatabase(original_directory=ROOT,
+                                references_eval=references_df_eval,
+                                references_dev=references_df_dev,
+                                probes_eval=probes_df_eval,
+                                probes_dev=probes_df_dev,
+                                matching_eval=matching_eval,
+                                matching_dev=matching_dev)
+
+    from bob.pipelines.config.distributed.sge_demanding import dask_client
+    from bob.extension.log import setup, set_verbosity_level
+
+    logger = setup('bob')
+    set_verbosity_level(logger, 3)
+
+    pipeline = iresnet_template()
+
+    # dask_client = "single-threaded"
+    bob.bio.base.pipelines.entry_points.execute_pipeline_simple(pipeline, soteria_db, dask_client = dask_client, groups=["dev", "eval"], output=OUTPUT_PATH, dask_n_partitions=None, write_metadata_scores=True, checkpoint=True, dask_partition_size=16, dask_n_workers=None)
+
+
+if __name__ == '__main__':
+    sys.path.append("/idiap/user/akomaty/projects/soteria/scripts/pipelines/vuln_analysis")
+    # Create an ArgumentParser object
+    parser = argparse.ArgumentParser()
+
+    # Add an argument with a default value
+    parser.add_argument( '-r' ,'--root', help='The root path where the data is stored.')
+    parser.add_argument( '-bo' ,'--bonafide_annot', help='The path where the bonafide annotations are located.')
+    parser.add_argument( '-pr' ,'--print_annot', help='The path where the print attacks annotations are located.')
+    parser.add_argument( '-re' ,'--replay_annot', help='The path where the replay attacks annotations are located.')
+    parser.add_argument( '-hy' ,'--hyg_mask_annot', help='The path where the hygienic mask attacks annotations are located.')
+    parser.add_argument( '-o' ,'--output_path', help='The output path where you want to save your extracted frames.')
+    
+
+    # Parse the arguments
+    args = parser.parse_args()
+    run_pipeline(args.root, args.bonafide_annot, args.print_annot, args.replay_annot, args.hyg_mask_annot, args.output_path)
+    # sample run: 
+    #       python pipeline_vuln.py -r /idiap/project/soteria/soteria_database_frames/bonafide/frames -bo /idiap/project/soteria/soteria_database_frames/bonafide/metadata/metadata_frames_600 -pr /idiap/project/soteria/soteria_database_frames/attack/print_attack/metadata/metadata_print_attack_600/ -re /idiap/project/soteria/soteria_database_frames/attack/replay_attack/metadata/metadata_replay_attack_600/ -hy /idiap/project/soteria/soteria_database_frames/attack/hygienic_mask_attack/metadata/metadata_hyg-mask_attack_600 -o /idiap/temp/akomaty/test_paper_run/
\ No newline at end of file
diff --git a/preprocessor/__init__.py b/preprocessor/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/preprocessor/cropper_custom.py b/preprocessor/cropper_custom.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a76e9fe5e0418479efafa368a50cedbc118c1f8
--- /dev/null
+++ b/preprocessor/cropper_custom.py
@@ -0,0 +1,415 @@
+# coding=utf-8
+"""
+  @file: cropper.py
+  @data: 20 January 2023
+  @author: Christophe Ecabert
+  @email: christophe.ecabert@idiap.ch
+"""
+import logging
+from typing import Optional, Callable, Dict, Any, Union, Iterable, Tuple
+import numpy as np
+from sklearn.base import BaseEstimator, TransformerMixin
+from bob.bio.face.color import gray_to_rgb, rgb_to_gray
+import cv2
+import bob.io.image as io
+
+logger = logging.getLogger(__name__)
+NORMALIZER_FCN = Callable[[np.ndarray, Optional[Dict[str, Any]]], np.ndarray]
+
+
+
+REF_POINTS = Union[str,
+                   Iterable[Tuple[float, float]],
+                   Dict[str, Tuple[float, float]]]
+
+
+# adapted from:
+# https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/mtcnn_pytorch/src/align_trans.py
+# https://github.com/deepinsight/insightface/blob/be3f7b3e0e635b56d903d845640b048247c41c90/common/face_align.py#L59
+REFERENCE_FACIAL_POINTS_112 = np.asarray([[38.29459953, 51.69630051],
+                                          [73.53179932, 51.50139999],
+                                          [56.02519989, 71.73660278],
+                                          [41.54930115, 92.3655014],
+                                          [70.72990036, 92.20410156]])
+
+
+POINTS = Union[np.ndarray,
+               Iterable[Tuple[float, float]],
+               Dict[str, Tuple[float, float]]]
+
+
+def _to_array(points: POINTS) -> np.ndarray:
+    """Convert inputs to numpy array. Input must be one of the following
+
+    Args:
+        - points:
+            1) Kx2 or 2xK np.array where each row or col is a pair of
+                coordinates (x, y)
+            2) List of tuple of 2 floats representing (x, y) coordinates
+            3) Dictionnary of tuple(y, x) with keys: 'reye', 'leye', 'nose',
+                'mouthright', 'mouthleft'
+
+    """
+    _points = points
+    if isinstance(_points, dict):
+        pts = []
+        for key in ('reye', 'leye', 'nose', 'mouthright', 'mouthleft'):
+            pts.append(list(_points[key])[::-1])
+        _points = pts
+    if isinstance(_points, (list, tuple)):
+        _points = np.asarray(_points)
+    return _points.astype(np.float32)
+
+
+def _check_dims(array: np.ndarray, name: str) -> None:
+    if max(array.shape) < 3 or min(array.shape) != 2:
+        msg = '{}.shape must be (K,2) or (2,K) and K > 2, got `{}`'
+        raise RuntimeError(msg.format(name, array.shape))
+
+
+def _transpose_if_needed(array: np.ndarray, size: int, axis: int = -1):
+    if array.shape[axis] != size:
+        return array.T
+    return array
+
+
+def _cv2_affine(sources: np.ndarray, targets: np.ndarray) -> np.ndarray:
+    return cv2.getAffineTransform(sources[0:3], targets[0:3])
+
+
+def _affine(sources: np.ndarray, targets: np.ndarray) -> np.ndarray:
+    trsfrm = np.eye(N=2, M=3, dtype=np.float32)
+    n_pts = sources.shape[0]
+    ones = np.ones((n_pts, 1), sources.dtype)
+    src = np.hstack([sources, ones])
+    tgt = np.hstack([targets, ones])
+    A, res, rank, s = np.linalg.lstsq(src, tgt, rcond=None)
+
+    if rank == 3:
+        trsfrm = np.asarray([[A[0, 0], A[1, 0], A[2, 0]],
+                             [A[0, 1], A[1, 1], A[2, 1]]]).astype(np.float32)
+    elif rank == 2:
+        trsfrm = np.asarray([[A[0, 0], A[1, 0], 0],
+                             [A[0, 1], A[1, 1], 0]]).astype(np.float32)
+    return trsfrm
+
+
+def _apply(trsfrm: np.ndarray, points: np.ndarray) -> np.ndarray:
+    points = np.hstack([points, np.ones((points.shape[0], 1))])
+    xy = np.dot(points, trsfrm)
+    xy = xy[:, 0:-1]
+    return xy
+
+
+def _find_similarity(sources: np.ndarray, targets: np.ndarray) -> np.ndarray:
+    # Solve for trans1
+    trans1 = _find_non_relective_similarity(sources, targets)
+    # Measure estimation error
+    tgt1 = _apply(trans1, sources)
+    norm1 = np.linalg.norm(tgt1 - targets)
+
+    # Solve for trans2
+    targets_R = targets.copy()
+    targets_R[:, 0] = -1 * targets_R[:, 0]
+    trans2r = _find_non_relective_similarity(sources, targets_R)
+
+    # manually reflect the tform to undo the reflection done on xyR
+    trsfrm_reflect_y = np.array([[-1, 0, 0],
+                                 [0, 1, 0],
+                                 [0, 0, 1]])
+    trans2 = np.dot(trans2r, trsfrm_reflect_y)
+    # Measure estimation error
+    tgt2 = _apply(trans2, sources)
+    norm2 = np.linalg.norm(tgt2 - targets)
+
+    # # Figure out if trans1 or trans2 is better
+    if norm1 <= norm2:
+        return trans1
+    else:
+        return trans2
+
+
+def _find_non_relective_similarity(sources: np.ndarray,
+                                   targets: np.ndarray) -> np.ndarray:
+    K = 2
+    M = sources.shape[0]
+    x = sources[:, 0:1]
+    y = sources[:, 1:2]
+
+    tmp1 = np.hstack((x, y, np.ones((M, 1)), np.zeros((M, 1))))
+    tmp2 = np.hstack((y, -x, np.zeros((M, 1)), np.ones((M, 1))))
+    X = np.vstack((tmp1, tmp2))
+
+    u = targets[:, 0:1]
+    v = targets[:, 1:2]
+    U = np.vstack((u, v))
+
+    # We know that X * r = U
+    if np.linalg.matrix_rank(X) >= 2 * K:
+        r, _, _, _ = np.linalg.lstsq(X, U, rcond=None)
+        r = np.squeeze(r)
+    else:
+        raise RuntimeError('cp2tform:twoUniquePointsReq')
+
+    sc = r[0]
+    ss = r[1]
+    tx = r[2]
+    ty = r[3]
+    trsfrm = np.array([[sc, -ss, 0],
+                       [ss,  sc, 0],
+                       [tx,  ty, 1]])
+    return trsfrm
+
+
+def _similarity(sources: np.ndarray,
+                targets: np.ndarray,
+                reflective: bool = True) -> np.ndarray:
+    if reflective:
+        trsfrm = _find_similarity(sources, targets)
+    else:
+        trsfrm = _find_non_relective_similarity(sources, targets)
+    # convert to opencv format for use with cv3.warpAffine
+    trsfrm = np.array(trsfrm[:, 0:2].T)
+    return trsfrm
+
+
+_transform = {'similarity': _similarity,
+              'affine': _affine,
+              'cv2_affine': _cv2_affine}
+
+
+def get_transform(sources: POINTS,
+                  targets: POINTS,
+                  type: str = 'similarity') -> np.ndarray:
+    # Format inputs
+    sources = _to_array(sources)
+    targets = _to_array(targets)
+    # Sanity checks
+    if type not in _transform:
+        msg = 'Unsupported type of transformation: {}, must be one of `{}`'
+        raise ValueError(msg.format(type, '`, `'.join(_transform.keys())))
+
+    _check_dims(sources, 'sources')
+    _check_dims(targets, 'targets')
+
+    sources = _transpose_if_needed(sources, size=2, axis=1)
+    targets = _transpose_if_needed(targets, size=2, axis=1)
+
+    if sources.shape != targets.shape:
+        msg = '`sources` and `targets` must have the same dimensions, got: ``'\
+              ' != ``'
+        raise ValueError(msg.format(sources.shape, targets.shape))
+
+    return _transform[type](sources, targets)
+
+
+
+class FivePointsNormalizer:
+    """ Five points (eyes, nose, mouths) points image normalizer """
+
+    _ref_pts = {'arcface': REFERENCE_FACIAL_POINTS_112}
+
+    def __init__(self,
+                 cropped_image_size: Tuple[int, int],
+                 reference_points: REF_POINTS,
+                 transform_type: str = 'similarity',
+                 interpolation=cv2.INTER_LINEAR):
+        """Constructor
+
+        Normalize image using five facial landmarks:
+
+
+        :param cropped_image_size: Final image dimensions
+        :param reference_points: List of reference points used to compute
+            transformation between input space and normalized image space. Must
+            be one of:
+                - str: Predefined configuration
+                - list of tuples (x, y)
+                - dict of tuples (y, x) with `leye`, `reye`, `nose`,
+                    `mouthright` and `mouthleft` keys
+        :param transform_type: Type of transformation to use. Must be one of
+            `similarity`, `affine` and `cv2-affine`, defaults to 'similarity'
+        :param interpolation: Type interpolation method used by cv2.WarpAffine,
+            defaults to cv2.INTER_LINEAR
+        """
+        annotation_keys=('reye',
+                              'leye',
+                              'nose',
+                              'mouthright',
+                              'mouthleft')
+        self.required_annotation_keys = annotation_keys
+        # Reference
+        if isinstance(reference_points, str):
+            if reference_points not in self._ref_pts:
+                msg = 'Unknown `reference_points`, must be one of `{}`, got ' \
+                      '`{}`'
+                raise ValueError(msg.format('`, `'.join(self._ref_pts.keys()),
+                                            reference_points))
+            reference_points = self._ref_pts[reference_points]
+        self.ref_points = reference_points
+        self.cropped_image_size = cropped_image_size
+        self.transform_type = transform_type
+        self.interpolation = interpolation
+
+    def validate_annotations(self,
+                             annotations: Optional[Dict[str, Any]]) -> bool:
+        # Do normalizer needs annotations ?
+        if self.required_annotation_keys is not None:
+            if (annotations is None or not
+                    (set(annotations.keys())
+                     .issuperset(self.required_annotation_keys))):
+                common_key = (set(annotations.keys())
+                              .intersection(self.required_annotation_keys)
+                              if annotations else set())
+                missing_key = self.required_annotation_keys - common_key
+                msg = 'Missing annotations: `{}`'
+                logger.warning(msg.format('`, `'.join(missing_key)))
+                return False
+        return True
+        
+    def __call__(self,
+                 image: np.ndarray,
+                 annotations: Optional[Dict[str, Any]] = None
+                 ) -> Optional[np.ndarray]:
+        # Sanity check
+        im_norm = None
+        if self.validate_annotations(annotations):
+            # Got everything we need, proceed
+            image = io.bob_to_opencvbgr(image)
+            # Compute normalization transform
+            trsfrm = get_transform(annotations,
+                                         self.ref_points,
+                                         self.transform_type)
+            # Warp
+            normalized = cv2.warpAffine(image, trsfrm, self.cropped_image_size)
+            im_norm = io.opencvbgr_to_bob(normalized)
+        return im_norm
+
+
+def _identity(image: np.ndarray) -> np.ndarray:
+    # rgb -> rgb
+    return image
+
+
+def _gray2rgb(image: np.ndarray) -> np.ndarray:
+    # gray -> rgb
+    return gray_to_rgb(image)
+
+
+def _rgb2bgr(image: np.ndarray) -> np.ndarray:
+    # rgb -> bgr
+    return image[[2, 1, 0], ...]
+
+
+def _rgb2gray(image: np.ndarray) -> np.ndarray:
+    # rgb -> gray
+    return rgb_to_gray(image)
+
+
+def _rgb2red(image: np.ndarray) -> np.ndarray:
+    # rgb -> red
+    return image[0, ...]
+
+
+def _rgb2green(image: np.ndarray) -> np.ndarray:
+    # rgb -> green
+    return image[1, ...]
+
+
+def _rgb2blue(image: np.ndarray) -> np.ndarray:
+    # rgb -> blue
+    return image[2, ...]
+
+class FaceNormalizer(BaseEstimator, TransformerMixin):
+
+    _str2dtype = {'uint8': np.uint8,
+                  'uint16': np.uint16,
+                  'uint32': np.uint32,
+                  'uint64': np.uint64,
+                  'int8': np.int8,
+                  'int16': np.int16,
+                  'int32': np.int32,
+                  'int64': np.int64,
+                  'half': np.float16,
+                  'float16': np.float16,
+                  'float': np.float32,
+                  'float32': np.float32,
+                  'double': np.float64,
+                  'float64': np.float64}
+
+    _channel_mapper = {(2, 'gray'): _identity,
+                       (3, 'gray'): _rgb2gray,
+                       (2, 'rgb'): _gray2rgb,
+                       (3, 'rgb'): _identity,
+                       (3, 'bgr'): _rgb2bgr,
+                       (3, 'red'): _rgb2red,
+                       (3, 'green'): _rgb2green,
+                       (3, 'blue'): _rgb2blue}
+
+    def __init__(self,
+                 normalizer: Union[str,
+                                   NORMALIZER_FCN],
+                 normalizer_arguments: Optional[Dict[str, Any]] = None,
+                 color_channel: str = 'rgb',
+                 dtype: str = 'float64'):
+        # Sanity check
+        if dtype not in self._str2dtype:
+            msg = 'Unknown `dtype`, must be one of `{}`, got `{}`'
+            raise ValueError(msg.format('`, `'.join(self._str2dtype.keys()),
+                                        dtype))
+
+        channels = set([k[1] for k in self._channel_mapper.keys()])
+        if color_channel not in channels:
+            msg = 'Unknown `color_channel`, must be one of `{}`, got `{}`'
+            raise ValueError(msg.format('`, `'.join(channels),
+                                        color_channel))
+
+        # Create image normalizer instance if needed
+        self.normalizer_arguments = normalizer_arguments or {}
+        if isinstance(normalizer, str):
+            a=0
+        elif not callable(normalizer):
+            msg = 'Parameter `normalizer` must be callable, got {}'
+            raise ValueError(msg.format(normalizer))
+        self.normalizer = normalizer
+
+        # Channels
+        self.color_channel = color_channel
+        self.dtype = self._str2dtype[dtype]
+
+    def select_color_channel(self, image: np.ndarray) -> np.ndarray:
+        key = (image.ndim, self.color_channel)
+        if key not in self._channel_mapper:
+            msg = 'There is no rule to convert `{}` channels into `{}`'
+            raise ValueError(msg.format(image.ndim, self.color_channel))
+        return self._channel_mapper[key](image)
+
+    def cast(self, image: np.ndarray) -> np.ndarray:
+        if image is not None:
+            image = image.astype(self.dtype)
+        return image
+
+    def transform(self,
+                  images: Iterable[np.ndarray],
+                  annotations: Optional[Dict[str, Any]] = None):
+
+        def _normalize(image: np.ndarray, annotation: Dict[str, Any]):
+            # Extract selected channels
+            image = self.select_color_channel(image)
+            # Run normalizer, might return None in case of failure
+            image = self.normalizer(image, annotation)
+            if image is None:
+                msg = 'Image normalizer `{}` failed and returned `None`'
+                logger.warning(msg.format(self.normalizer))
+            # Convert to selected data type
+            return self.cast(image)
+
+        # If annotations are missing, add them
+        if annotations is None:
+            annotations = [None] * len(images)
+
+        # Run normalizer on every images
+        return [_normalize(img, ann) for img, ann in zip(images, annotations)]
+
+    def _more_tags(self):
+        return {'requires_fit': False}
diff --git a/preprocessor/extract_frames.py b/preprocessor/extract_frames.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5d29f3f4ec1bd53f74ce6f114443eea4bb446e2
--- /dev/null
+++ b/preprocessor/extract_frames.py
@@ -0,0 +1,42 @@
+"""
+This code will use bob to extract all the frames from each video in the soteria dataset.
+It takes one input containing a list of video files to be treated.
+A sample run is as follows:
+    
+      $ extract_frames.py -l videos_1.txt -o /tmp/myDIR/ -m 20
+    
+      where videos_1.txt is a list containing a certain number of videos paths.
+"""
+
+import os
+import matplotlib.image
+import argparse
+from bob.bio.video import VideoAsArray
+from bob.io.image import to_matplotlib
+
+def extract_frames(video_list, output_path, max_number_of_frames):
+
+  with open(video_list, 'r') as fp:
+    for video_path in fp:
+      video_path = video_path.strip()
+      myvid = VideoAsArray(video_path, max_number_of_frames=max_number_of_frames)
+      index = 0
+      for frame in myvid:
+        id = video_path.split("/")[-2]
+        filename=video_path.split("/")[-1].split(".")[0]
+        dest_file_path = os.path.join(os.path.join(output_path, id), filename + "_frame-" + str(index)+ '.jpg')
+        matplotlib.image.imsave(dest_file_path, to_matplotlib(frame))
+        index +=1
+
+if __name__ == '__main__':
+    # Create an ArgumentParser object
+    parser = argparse.ArgumentParser()
+
+    # Add an argument with a default value
+    parser.add_argument('-l', '--video_list', help='A path to a file containing the list of videos.')
+    parser.add_argument( '-o' ,'--output_path', help='The output path where you want to save your extracted frames.')
+    parser.add_argument( '-m' ,'--max_frames', default=20, help='The maximum number of frames to be extracted.')
+
+    # Parse the arguments
+    args = parser.parse_args()
+    extract_frames(args.video_list, args.output_path, args.max_frames)
diff --git a/utils/__init__.py b/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/utils/_utils.py b/utils/_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee301cd6d65f7cd44a6fcdca7fadca8bc29e8ab4
--- /dev/null
+++ b/utils/_utils.py
@@ -0,0 +1,173 @@
+import math
+import pandas as pd
+import pickle
+import yaml
+
+SCENARIOS=["indoor_normal_light", "indoor_low_light", "outdoor_lateral_light"]
+
+def area_bounding_box(annotations):
+    x0 = annotations['topleft'][1]
+    y0 = annotations['topleft'][0]
+    x1 = annotations['bottomright'][1]
+    y1 = annotations['bottomright'][0]
+    return math.abs(y1-y0)*math.abs(x1-x0)
+
+def read_pkl(path):
+    """Reads the pkl file at a certain path and returns its content in a pandas dataframe
+
+    Args:
+        path (_type_): a path to the pkl file
+
+    Returns:
+        pandas.DataFrame
+    """
+    return pd.read_pickle(path)
+
+def drop_rows_no_faces(df):
+    """Drops all the rows of a given dataframe if there is no detected face
+
+    Args:
+        df (pandas.DataFrame)
+
+    Returns:
+        pandas.DataFrame
+    """
+    return df.drop(df[df['no_faces']==0].index, inplace=True)
+
+def get_eye_pos(row):
+    """Get eye positions given a row in a pandas dataframe
+
+    Args:
+        row (pandas.core.series.Series): the row containing the landmarks
+
+    Returns:
+        _dict_: a dictionnary of the form {'leye': (LEYE_Y, LEYE_X), 'reye': (REYE_Y, REYE_X)}
+    """
+    row = row["landmarks"]
+    right_eye = float(row[0][0]), float(row[0][5])
+    left_eye = float(row[0][1]), float(row[0][6])
+    return {'reye': right_eye, 'leye': left_eye }
+
+def get_landmarks(row):
+    """Get the five landmarks for the new face cropper given a row in a pandas dataframe
+
+    Args:
+        row (pandas.core.series.Series): the row containing the landmarks
+
+    Returns:
+        _dict_: a dictionnary of the form {'leye': (LEYE_Y, LEYE_X), 'reye': (REYE_Y, REYE_X)}
+    """
+    row = row["landmarks"]
+    right_eye = float(row[0][0]), float(row[0][5])
+    left_eye = float(row[0][1]), float(row[0][6])
+    nose = float(row[0][2]), float(row[0][7])
+    mouthright = float(row[0][3]), float(row[0][8])
+    mouthleft = float(row[0][4]), float(row[0][9])
+    return {'reye': right_eye, 'leye': left_eye, 'nose':nose, 'mouthright': mouthright,  'mouthleft': mouthleft}
+
+def delete_value(row):
+    return None
+    
+
+def delete_unwanted_cols(df):
+    """Deletes unwanted columns from a dataframe
+
+    Args:
+        df (_type_): _description_
+    """
+    del df['probs_faces']
+    del df['boxes']
+
+def get_reference_id(row):
+    """ This method defines the reference_id (should be named template_id)
+    """
+    filename = row['filename']
+    subject_id = filename.split("_")[0]
+    phone = filename.split("_")[1]
+    camera = filename.split("_")[2]
+    is_bonafide = filename.split("_")[3] == 'BONAFIDE'
+    with_mask = filename.split("_")[4] == "mask"
+    session = filename.split("_")[-3]
+    frame_nb = filename.split("_")[-1].split(".")[0].split("-")[-1]
+    if "mask" in filename:
+        scenario = "_".join(filename.split("_")[5:8])
+    else:
+        scenario = "_".join(filename.split("_")[4:7])
+    scenario_id = SCENARIOS.index(scenario)
+    reference_id = '_'.join((subject_id, session,str(scenario_id), camera[0])) # Should be replaced by template_id
+    return reference_id
+
+def process_df(path):
+    df = read_pkl(path)
+    delete_unwanted_cols(df)
+    drop_rows_no_faces(df)
+    df['annotations'] = df.apply(lambda row: get_landmarks(row), axis=1)
+    df['landmarks'] = df.apply(lambda row: delete_value(row), axis=1)
+    # df['reference_id'] = df.apply(lambda row: get_reference_id(row), axis=1)
+    return df
+
+def should_remove(row, df):
+    """Get the mask information for a given print attack, given its filename, and a dataframe (df) containing the information about the print attacks. The dataframe is read from the csv file print_attacks_data.csv using pd.read_csv(print_attack_csv_file)"""
+    filename = row["filename"]
+    filename = filename.split("_frame")[0] + ".mp4"
+    # replace print_attack by print-attack
+    filename = filename.replace("a1_print", "a1-print")
+    idx = df.index[df['path'].str.contains(filename)==True].tolist()[0]
+    source =  df['source'].iloc[idx]
+    with_mask = "mask" in source
+    return with_mask
+
+
+def save_pkl(data, path):
+    with open(path, 'wb') as f:
+        pickle.dump(data, f)
+
+def load_pkl(path):
+    with open(path, 'rb') as f:
+        return pickle.load(f)
+
+def create_matchings_for_attacks(probes_df):
+    """Create list of tuple matchings between attacks and probes
+
+    Args:
+        probes (pandas dataframe): _description_
+        references (set of reference_ids (template_ids)): _description_
+    """
+    id = probes_df['filename'].str.split('_').str[0] + '_s1'
+    matching = probes_df['filename']
+    matching = pd.DataFrame(matching)
+    matching.columns = ["probe_reference_id"]
+    matching["bio_ref_reference_id"] = id
+    return matching
+
+def create_matchings_for_attacks_oneID(probes_df, single_id):
+    """Create list of tuple matchings between attacks and probes
+
+    Args:
+        probes (pandas dataframe): _description_
+        references (set of reference_ids (template_ids)): _description_
+    """
+    matching = probes_df['filename']
+    matching = pd.DataFrame(matching)
+    matching.columns = ["probe_reference_id"]
+    matching = matching[matching['probe_reference_id'].str.contains('set-4')]
+    matching["bio_ref_reference_id"] = single_id
+    return matching
+
+
+
+def read_config_file(config_file):
+    """This method reads a yaml config file
+
+    Args:
+        config_file ([type])
+
+    Returns:
+        [type]: [description]    
+    """
+    with open(config_file, 'r') as stream:
+        try:
+            config = yaml.safe_load(stream)
+            return config
+        except yaml.YAMLError as exc:
+            print(exc)
\ No newline at end of file
diff --git a/utils/load_annotations.py b/utils/load_annotations.py
new file mode 100644
index 0000000000000000000000000000000000000000..379b6fe799a921c106bd4f23f5c1fddb0d4f8016
--- /dev/null
+++ b/utils/load_annotations.py
@@ -0,0 +1,21 @@
+import os, argparse
+from utils import _utils
+import pandas as pd
+
+def load_annotations(annotations_path):
+    frames = [ _utils.process_df(os.path.join(annotations_path, f)) for f in os.listdir(annotations_path)]
+    df = pd.concat(frames)
+    df['frame_nb'] = df['frame_nb'].astype(int)
+    return df
+
+
+if __name__ == '__main__':
+    # Create an ArgumentParser object
+    parser = argparse.ArgumentParser()
+
+    # Add an argument with a default value
+    parser.add_argument('-a', '--annotations_path', help='A path to a file containing the annotations.')
+
+    # Parse the arguments
+    args = parser.parse_args()
+    load_annotations(args.annotations_path)
\ No newline at end of file
diff --git a/utils/plotting.py b/utils/plotting.py
new file mode 100644
index 0000000000000000000000000000000000000000..db9ea2ef6e7845710d4bb66f13347162f57abed5
--- /dev/null
+++ b/utils/plotting.py
@@ -0,0 +1,27 @@
+import matplotlib.pyplot as plt
+import math
+
+def plot_figures(figures):
+    """ This function creates a grid of subplots with a dynamic number of rows and columns based on the size of the input list figures, and plots each figure in the corresponding subplot.
+
+    Args:
+        figures (list): a list of figures in h5 format
+
+    Usage:
+        figures = [fig1, fig2, fig3, fig4, fig5]
+        plot_figures(figures)
+    """
+    num_figures = len(figures)
+    num_rows = math.ceil(math.sqrt(num_figures))
+    num_cols = math.ceil(num_figures / num_rows)
+
+    fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(10, 10))
+
+    for i, ax in enumerate(axes.flat):
+        if i < num_figures:
+            ax.imshow(figures[i])
+            ax.axis('off')
+        else:
+            ax.axis('off')
+
+    plt.show()