First simple version of writing empty configuration files

I have implemented a short function that generates an empty configuration file with a complete list of options, including the command line parser help. This function also works for derived class scripts, such as verify_gmm.py.

So far, all (non-suppressed) options are written in the same order as they are added to the parser. There is no grouping of options, because I could not figure out, how this would work using the parser itself -- if this is required, a different alternative to this function needs to be implemented.

@amohammadi @tiago.pereira @andre.anjos Could you please check this function (simply run ./bin/verify.py -H test.py) to see how it might be improved (e.g. by adding a more informative file header)?

added 1 commit

c6382e00 - Implemented check that all configuration file variables are consumed (passes als…

Compare with previous version

mentioned in issue #45 (closed)

Wow, this is great!!!!

Looking at this file you realize the complexity to set an experiment.

One thing that we can do to make the life of a newcomer simpler is to provide, as output, a minimum amount of options possible.

For example a minimum could be

# preprocessor = ...

# extractor = ...

# algorithm = ...

# database=...

#temp-directory=...

#result-directory=...

We can use the variable metavar (with some modifications) to filter this information. What do you think?

Indeed, we should be able to do that. I though about having a fixed (ordered) list of parameters that go first, and then a list of possible options that can be modified afterwards. I would like to use the groups that we define in the parser, but this is not easily possible, unfortunately.

I am not sure if metavar will make any difference here...

added 1 commit

90880951 - Created 2 classes of arguments (mandatory and optional)

Compare with previous version

The last commit provides the following output.

Is it cleaner?

# Configuration file automatically generated at 2017-02-27 for /remote/idiap.svm/user.active/tpereira/gitlab/workspace-BobBio-dev/bin/verify.py

# Database and the protocol; registered databases are: ['arface', 'atnt', 'banca', 'caspeal', 'frgc', 'gbu', 'ijba', 'lfw-restricted', 'lfw-unrestricted', 'mobio-female', 'mobio-image', 'mobio-male', 'msu-mfsd-mod-licit', 'msu-mfsd-mod-spoof', 'multipie', 'multipie-pose', 'replay-img-licit', 'replay-img-spoof', 'replaymobile-img-licit', 'replaymobile-img-spoof', 'scface', 'xm2vts']

#database = None


# Data preprocessing; registered preprocessors are: ['base', 'face-crop-eyes', 'face-detect', 'filename', 'histogram', 'histogram-crop', 'histogram-landmark', 'inorm-lbp', 'inorm-lbp-crop', 'inorm-lbp-landmark', 'landmark-detect', 'self-quotient', 'self-quotient-crop', 'self-quotient-landmark', 'tan-triggs', 'tan-triggs-crop', 'tan-triggs-landmark']

#preprocessor = None


# Feature extraction; registered feature extractors are: ['dct-blocks', 'eigenface', 'grid-graph', 'lgbphs', 'linearize']

#extractor = None


# Biometric recognition; registered algorithms are: ['bic', 'bic-jets', 'distance-cosine', 'distance-euclidean', 'gabor-jet', 'histogram', 'lda', 'pca', 'pca+lda', 'pca+plda', 'plda']

#algorithm = None


# Configuration for the grid setup; if not specified, the commands are executed sequentially on the local machine; registered grid resources are ['demanding', 'gpu', 'grid', 'local-p16', 'local-p4', 'local-p8'].

#grid = None


################################################## 
############### OPTIONAL ARGUMENTS ############### 
##################################################

# If one of your configuration files is an actual command, please specify the lists of required libraries (imports) to execute this command

#imports = ['bob.bio.base']


# If resources with identical names are defined in several packages, prefer the one from the given package

#preferred_package = None


# The sub-directory where the files of the current experiment should be stored. Please specify a directory name with a name describing your experiment

#sub_directory = None


# The groups (i.e., 'dev', 'eval') for which the models and scores should be generated; by default, only the 'dev' group is evaluated

#groups = ['dev']


# Overwrite the protocol that is stored in the database by the given one (might not by applicable for all databases).

#protocol = None


# The directory for temporary files; if --temp-directory is not specified, "/idiap/temp/tpereira/[database-name]/[sub-directory]" is used

#temp_directory = None


# The directory for resulting score files; if --result-directory is not specified, "/idiap/user/tpereira/[database-name]/[sub-directory]" is used

#result_directory = None


# Name of the file to write the feature extractor into.

#extractor_file = 'Extractor.hdf5'


# Name of the file to write the feature projector into.

#projector_file = 'Projector.hdf5'


# Name of the file to write the model enroller into.

#enroller_file = 'Enroller.hdf5'


# The database file in which the submitted jobs will be written; relative to the current directory (only valid with the --grid option).

#gridtk_database_file = 'submitted.sql3'


# The file where the configuration of all parts of the experiments are written; relative to te --result-directory.

#experiment_info_file = 'Experiment.info'


# An optional file, where database directories are stored (to avoid changing the database configurations)

#database_directories_file = '/idiap/home/tpereira/.bob_bio_databases.txt'


# Name of the directory of the preprocessed data.

#preprocessed_directory = 'preprocessed'


# Name of the directory of the extracted features.

#extracted_directory = 'extracted'


# Name of the directory where the projected data should be stored.

#projected_directory = 'projected'


# Name of the directory where the models (and T-Norm models) should be stored

#model_directories = ['models', 'tmodels']


# Name of the directory (relative to --result-directory) where to write the results to

#score_directories = ['nonorm', 'ztnorm']


# Name of the directories (of --temp-directory) where to write the ZT-norm values; only used with --zt-norm

#zt_directories = ['zt_norm_A', 'zt_norm_B', 'zt_norm_C', 'zt_norm_D', 'zt_norm_D_sameValue']


# Name of the directory (relative to --temp-directory) where to log files are written; only used with --grid

#grid_log_directory = 'gridtk_logs'


# Increase the verbosity level from 0 (only error messages) to 1 (warnings), 2 (log messages), 3 (debug information) by adding the --verbose option as often as desired (e.g. '-vvv' for debug).

#verbose = 0


# Only report the commands that will be executed, but do not execute them.

#dry_run = False


# Force to erase former data if already exist

#force = False


# Writes score files which are compressed with tar.bz2.

#write_compressed_score_files = False


# Try to recursively stop the dependent jobs from the SGE grid queue, when a job failed

#stop_on_failure = False


# The jobs submitted to the grid have dependencies on the given job ids.

#external_dependencies = []


# Measure and report the time required by the execution of the tool chain (only on local machine)

#timer = None


# Starts the local scheduler after submitting the jobs to the local queue (by default, local jobs must be started by hand, e.g., using ./bin/jman --local -vv run-scheduler -x)

#run_local_scheduler = False


# Runs the local scheduler with the given nice value

#nice = 10


# If selected, local scheduler jobs that finished with the given status are deleted from the --gridtk-database-file; otherwise the jobs remain in the database

#delete_jobs_finished_with_status = None


# Performs score calibration after the scores are computed.

#calibrate_scores = False


# Enable the computation of ZT norms

#zt_norm = False


# If given, missing files will not stop the processing; this is helpful if not all files of the database can be processed; missing scores will be NaN.

#allow_missing_files = False


# This flag is a shortcut for running the commands on the local machine with the given amount of parallel threads; equivalent to --grid bob.bio.base.grid.Grid("local", number_of_parallel_threads=X) --run-local-scheduler --stop-on-failure.

#parallel = None


# Passes specific environment variables to the job.

#env = []


# Skip the preprocessing step.

#skip_preprocessing = False


# Skip the extractor-training step.

#skip_extractor_training = False


# Skip the extraction step.

#skip_extraction = False


# Skip the projector-training step.

#skip_projector_training = False


# Skip the projection step.

#skip_projection = False


# Skip the enroller-training step.

#skip_enroller_training = False


# Skip the enrollment step.

#skip_enrollment = False


# Skip the score-computation step.

#skip_score_computation = False


# Skip the concatenation step.

#skip_concatenation = False


# Skip the calibration step.

#skip_calibration = False


# If specified, executes only the given parts of the tool chain.

#execute_only = None

@tiago.pereira this looks great. It's a bit long but this is due to the fact that there are so many options anyway.

Generally, I would ask for documentation (advertising that this might be the easiest way and you can also divide them to several config files) and unit testing (just running verify.py -H config.py once maybe) for this merge request too.

Here is an example of how I use verify.py nowadays:

$ bin/verify_isv.py config_base.py config_isv_eyes.py config_mobio.py --grid demanding -G isv_mobio.sql3 -s isv-male
$ bin/verify_isv.py config_base.py config_isv_topleft.py config_replay_licit.py config_grid.py -G isv_replay_licit.sql3
$ bin/verify_isv.py config_base.py config_isv_topleft.py config_replay_spoof.py config_grid.py -G isv_replay_spoof.sql3
$ bin/verify_isv.py config_base.py config_isv_topleft.py config_replaymobile_licit.py config_grid.py -G isv_replaymobile_licit.sql3 --force
$ bin/verify_isv.py config_base.py config_isv_topleft.py config_replaymobile_spoof.py config_grid.py -G isv_replaymobile_spoof.sql3 --force
$ bin/verify_isv.py config_base.py config_isv_eyes.py config_msumfsd_licit.py config_grid.py --database-directories-file db_temp_folder.txt -G isv_msumfsd.sql3
$ bin/verify_isv.py config_base.py config_isv_eyes.py config_msumfsd_spoof.py config_grid.py --database-directories-file db_temp_folder.txt -G isv_msumfsd.sql3
$ bin/verify.py config_base.py config_gaborgraph_eyes.py config_mobio.py config_grid.py -G gaborgraph_mobio.sql3 -s gaborgraph-male
$ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replay_licit.py config_grid.py -G gaborgraph_replay_licit.sql3
$ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replay_spoof.py config_grid.py -G gaborgraph_replay_spoof.sql3
$ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replaymobile_licit.py config_grid.py -G gaborgraph_replaymobile_licit.sql3 --force
$ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replaymobile_spoof.py config_grid.py -G gaborgraph_replaymobile_spoof.sql3 --force
$ bin/verify.py config_base.py config_gaborgraph_eyes.py config_msumfsd_licit.py config_grid.py --database-directories-file db_temp_folder.txt -G gaborgraph_msumfsd.sql3
$ bin/verify.py config_base.py config_gaborgraph_eyes.py config_msumfsd_spoof.py config_grid.py --database-directories-file db_temp_folder.txt -G gaborgraph_msumfsd.sql3

---
config_base.py
---
allow_missing_files = True
verbose = 3
env = ['OPENBLAS_NUM_THREADS=1', 'MKL_NUM_THREADS=1']

---
config_isv_eyes.py
---
from bob.bio.face.config.preprocessor.tan_triggs import preprocessor
from bob.bio.face.config.extractor.dct_blocks import extractor
from bob.bio.gmm.config.algorithm.isv import algorithm
sub_directory = 'isv'

---
config_isv_topleft.py
---
from bob.bio.face.config.extractor.dct_blocks import extractor
from bob.bio.gmm.config.algorithm.isv import algorithm
from bob.bio.face.preprocessor import TanTriggs
from bob.bio.face.config.preprocessor.face_crop_eyes import preprocessor_head
preprocessor = TanTriggs(face_cropper=preprocessor_head)
sub_directory = 'isv'

---
config_mobio.py
---
from bob.bio.face.config.database.mobio import mobio_male as database
zt_norm = True
groups = ['dev', 'eval']

---
config_replay_licit.py
---
from bob.bio.face.config.database.replay import replay_licit as database
groups = ['world', 'dev', 'eval']
skip_kmeans = True
skip_gmm = True
skip_isv = True
skip_extractor_training = True
skip_projector_training = True
skip_enroller_training = True

---
config_grid.py
---
import bob.bio.base

# define a queue with demanding parameters
grid = bob.bio.base.grid.Grid(
    training_queue='4G',
    # preprocessing
    preprocessing_queue='4G',
    # feature extraction
    extraction_queue='4G',
    # feature projection
    projection_queue='4G',
    # model enrollment
    enrollment_queue='4G',
    # scoring
    scoring_queue='4G'
)

If you take some time and look at it, you will see that I am taking advantage of both configuration files and --options to run several different experiments.

Yes, the documentation needs to include all this. Also, the baselines.py script (which is defined in bob.bio.face, AFAIR) needs to be outdated and replaced by default configuration files. I hope that I find some time for that this week.

@tiago.pereira this looks OK, but I would put some more common options (like verbose, groups) a little bit higher. I will have a look to your implementation and propose something...

@amohammadi I think, this is the way that the command line is supposed to work. BTW: Did you know that you can have resources inside your config file? For example, it is safe to write:

preprocessor = 'tan-triggs'
algorithm = 'isv'

instead of mentioning the configuration files. Also, the world group is automatically added when any tool needs training. Adding it manually does not make much sense / is not required.

BTW: Did you know that you can have resources inside your config file? For example, it is safe to write:

Yes but the problem with that is those are entry points and can be replaced by other packages. Importing them is more verbose and robust.

Also, the world group is automatically added when any tool needs training. Adding it manually does not make much sense / is not required.

Some of my databases have enroll and probe in the world group and I want their scores too.

added 1 commit

a31554f1 - Proposed a better split for the options.

Compare with previous version

I have implemented a better split for the options. Unfortunately, this only works for options defined in bob.bio.base, but not for the additional options defined in bob.bio.gmm, e.g.: https://gitlab.idiap.ch/bob/bob.bio.gmm/blob/master/bob/bio/gmm/tools/command_line.py#L14 These parameters are always listed last. I could hack together something to add these options to where they should belong, but this would most probably include global variables, and communication in between the packages...

Let me give it a try...

added 1 commit

ff77c3c1 - Made different lists specifiable by dependent packages (e.g. bob.bio.gmm)

Compare with previous version

mentioned in issue bob.bio.face#13 (closed)

assigned to @mguenther

unmarked as a Work In Progress

mentioned in commit 0c672d5e

merged

mentioned in issue #78 (closed)

First simple version of writing empty configuration files

Merge request reports

Activity