First simple version of writing empty configuration files
Closes #45 (closed)
Merge request reports
Activity
I have implemented a short function that generates an empty configuration file with a complete list of options, including the command line parser help. This function also works for derived class scripts, such as
verify_gmm.py
.So far, all (non-suppressed) options are written in the same order as they are added to the
parser
. There is no grouping of options, because I could not figure out, how this would work using theparser
itself -- if this is required, a different alternative to this function needs to be implemented.@amohammadi @tiago.pereira @andre.anjos Could you please check this function (simply run
./bin/verify.py -H test.py
) to see how it might be improved (e.g. by adding a more informative file header)?added 1 commit
- c6382e00 - Implemented check that all configuration file variables are consumed (passes als…
mentioned in issue #45 (closed)
Wow, this is great!!!!
Looking at this file you realize the complexity to set an experiment.
One thing that we can do to make the life of a newcomer simpler is to provide, as output, a minimum amount of options possible.
For example a minimum could be
# preprocessor = ... # extractor = ... # algorithm = ... # database=... #temp-directory=... #result-directory=...
We can use the variable
metavar
(with some modifications) to filter this information. What do you think?Indeed, we should be able to do that. I though about having a fixed (ordered) list of parameters that go first, and then a list of possible options that can be modified afterwards. I would like to use the groups that we define in the parser, but this is not easily possible, unfortunately.
I am not sure if
metavar
will make any difference here...Edited by Manuel Güntheradded 1 commit
- 90880951 - Created 2 classes of arguments (mandatory and optional)
The last commit provides the following output.
Is it cleaner?
# Configuration file automatically generated at 2017-02-27 for /remote/idiap.svm/user.active/tpereira/gitlab/workspace-BobBio-dev/bin/verify.py # Database and the protocol; registered databases are: ['arface', 'atnt', 'banca', 'caspeal', 'frgc', 'gbu', 'ijba', 'lfw-restricted', 'lfw-unrestricted', 'mobio-female', 'mobio-image', 'mobio-male', 'msu-mfsd-mod-licit', 'msu-mfsd-mod-spoof', 'multipie', 'multipie-pose', 'replay-img-licit', 'replay-img-spoof', 'replaymobile-img-licit', 'replaymobile-img-spoof', 'scface', 'xm2vts'] #database = None # Data preprocessing; registered preprocessors are: ['base', 'face-crop-eyes', 'face-detect', 'filename', 'histogram', 'histogram-crop', 'histogram-landmark', 'inorm-lbp', 'inorm-lbp-crop', 'inorm-lbp-landmark', 'landmark-detect', 'self-quotient', 'self-quotient-crop', 'self-quotient-landmark', 'tan-triggs', 'tan-triggs-crop', 'tan-triggs-landmark'] #preprocessor = None # Feature extraction; registered feature extractors are: ['dct-blocks', 'eigenface', 'grid-graph', 'lgbphs', 'linearize'] #extractor = None # Biometric recognition; registered algorithms are: ['bic', 'bic-jets', 'distance-cosine', 'distance-euclidean', 'gabor-jet', 'histogram', 'lda', 'pca', 'pca+lda', 'pca+plda', 'plda'] #algorithm = None # Configuration for the grid setup; if not specified, the commands are executed sequentially on the local machine; registered grid resources are ['demanding', 'gpu', 'grid', 'local-p16', 'local-p4', 'local-p8']. #grid = None ################################################## ############### OPTIONAL ARGUMENTS ############### ################################################## # If one of your configuration files is an actual command, please specify the lists of required libraries (imports) to execute this command #imports = ['bob.bio.base'] # If resources with identical names are defined in several packages, prefer the one from the given package #preferred_package = None # The sub-directory where the files of the current experiment should be stored. Please specify a directory name with a name describing your experiment #sub_directory = None # The groups (i.e., 'dev', 'eval') for which the models and scores should be generated; by default, only the 'dev' group is evaluated #groups = ['dev'] # Overwrite the protocol that is stored in the database by the given one (might not by applicable for all databases). #protocol = None # The directory for temporary files; if --temp-directory is not specified, "/idiap/temp/tpereira/[database-name]/[sub-directory]" is used #temp_directory = None # The directory for resulting score files; if --result-directory is not specified, "/idiap/user/tpereira/[database-name]/[sub-directory]" is used #result_directory = None # Name of the file to write the feature extractor into. #extractor_file = 'Extractor.hdf5' # Name of the file to write the feature projector into. #projector_file = 'Projector.hdf5' # Name of the file to write the model enroller into. #enroller_file = 'Enroller.hdf5' # The database file in which the submitted jobs will be written; relative to the current directory (only valid with the --grid option). #gridtk_database_file = 'submitted.sql3' # The file where the configuration of all parts of the experiments are written; relative to te --result-directory. #experiment_info_file = 'Experiment.info' # An optional file, where database directories are stored (to avoid changing the database configurations) #database_directories_file = '/idiap/home/tpereira/.bob_bio_databases.txt' # Name of the directory of the preprocessed data. #preprocessed_directory = 'preprocessed' # Name of the directory of the extracted features. #extracted_directory = 'extracted' # Name of the directory where the projected data should be stored. #projected_directory = 'projected' # Name of the directory where the models (and T-Norm models) should be stored #model_directories = ['models', 'tmodels'] # Name of the directory (relative to --result-directory) where to write the results to #score_directories = ['nonorm', 'ztnorm'] # Name of the directories (of --temp-directory) where to write the ZT-norm values; only used with --zt-norm #zt_directories = ['zt_norm_A', 'zt_norm_B', 'zt_norm_C', 'zt_norm_D', 'zt_norm_D_sameValue'] # Name of the directory (relative to --temp-directory) where to log files are written; only used with --grid #grid_log_directory = 'gridtk_logs' # Increase the verbosity level from 0 (only error messages) to 1 (warnings), 2 (log messages), 3 (debug information) by adding the --verbose option as often as desired (e.g. '-vvv' for debug). #verbose = 0 # Only report the commands that will be executed, but do not execute them. #dry_run = False # Force to erase former data if already exist #force = False # Writes score files which are compressed with tar.bz2. #write_compressed_score_files = False # Try to recursively stop the dependent jobs from the SGE grid queue, when a job failed #stop_on_failure = False # The jobs submitted to the grid have dependencies on the given job ids. #external_dependencies = [] # Measure and report the time required by the execution of the tool chain (only on local machine) #timer = None # Starts the local scheduler after submitting the jobs to the local queue (by default, local jobs must be started by hand, e.g., using ./bin/jman --local -vv run-scheduler -x) #run_local_scheduler = False # Runs the local scheduler with the given nice value #nice = 10 # If selected, local scheduler jobs that finished with the given status are deleted from the --gridtk-database-file; otherwise the jobs remain in the database #delete_jobs_finished_with_status = None # Performs score calibration after the scores are computed. #calibrate_scores = False # Enable the computation of ZT norms #zt_norm = False # If given, missing files will not stop the processing; this is helpful if not all files of the database can be processed; missing scores will be NaN. #allow_missing_files = False # This flag is a shortcut for running the commands on the local machine with the given amount of parallel threads; equivalent to --grid bob.bio.base.grid.Grid("local", number_of_parallel_threads=X) --run-local-scheduler --stop-on-failure. #parallel = None # Passes specific environment variables to the job. #env = [] # Skip the preprocessing step. #skip_preprocessing = False # Skip the extractor-training step. #skip_extractor_training = False # Skip the extraction step. #skip_extraction = False # Skip the projector-training step. #skip_projector_training = False # Skip the projection step. #skip_projection = False # Skip the enroller-training step. #skip_enroller_training = False # Skip the enrollment step. #skip_enrollment = False # Skip the score-computation step. #skip_score_computation = False # Skip the concatenation step. #skip_concatenation = False # Skip the calibration step. #skip_calibration = False # If specified, executes only the given parts of the tool chain. #execute_only = None
@tiago.pereira this looks great. It's a bit long but this is due to the fact that there are so many options anyway.
Generally, I would ask for documentation (advertising that this might be the easiest way and you can also divide them to several config files) and unit testing (just running
verify.py -H config.py
once maybe) for this merge request too.Here is an example of how I use verify.py nowadays:
$ bin/verify_isv.py config_base.py config_isv_eyes.py config_mobio.py --grid demanding -G isv_mobio.sql3 -s isv-male $ bin/verify_isv.py config_base.py config_isv_topleft.py config_replay_licit.py config_grid.py -G isv_replay_licit.sql3 $ bin/verify_isv.py config_base.py config_isv_topleft.py config_replay_spoof.py config_grid.py -G isv_replay_spoof.sql3 $ bin/verify_isv.py config_base.py config_isv_topleft.py config_replaymobile_licit.py config_grid.py -G isv_replaymobile_licit.sql3 --force $ bin/verify_isv.py config_base.py config_isv_topleft.py config_replaymobile_spoof.py config_grid.py -G isv_replaymobile_spoof.sql3 --force $ bin/verify_isv.py config_base.py config_isv_eyes.py config_msumfsd_licit.py config_grid.py --database-directories-file db_temp_folder.txt -G isv_msumfsd.sql3 $ bin/verify_isv.py config_base.py config_isv_eyes.py config_msumfsd_spoof.py config_grid.py --database-directories-file db_temp_folder.txt -G isv_msumfsd.sql3 $ bin/verify.py config_base.py config_gaborgraph_eyes.py config_mobio.py config_grid.py -G gaborgraph_mobio.sql3 -s gaborgraph-male $ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replay_licit.py config_grid.py -G gaborgraph_replay_licit.sql3 $ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replay_spoof.py config_grid.py -G gaborgraph_replay_spoof.sql3 $ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replaymobile_licit.py config_grid.py -G gaborgraph_replaymobile_licit.sql3 --force $ bin/verify.py config_base.py config_gaborgraph_topleft.py config_replaymobile_spoof.py config_grid.py -G gaborgraph_replaymobile_spoof.sql3 --force $ bin/verify.py config_base.py config_gaborgraph_eyes.py config_msumfsd_licit.py config_grid.py --database-directories-file db_temp_folder.txt -G gaborgraph_msumfsd.sql3 $ bin/verify.py config_base.py config_gaborgraph_eyes.py config_msumfsd_spoof.py config_grid.py --database-directories-file db_temp_folder.txt -G gaborgraph_msumfsd.sql3 --- config_base.py --- allow_missing_files = True verbose = 3 env = ['OPENBLAS_NUM_THREADS=1', 'MKL_NUM_THREADS=1'] --- config_isv_eyes.py --- from bob.bio.face.config.preprocessor.tan_triggs import preprocessor from bob.bio.face.config.extractor.dct_blocks import extractor from bob.bio.gmm.config.algorithm.isv import algorithm sub_directory = 'isv' --- config_isv_topleft.py --- from bob.bio.face.config.extractor.dct_blocks import extractor from bob.bio.gmm.config.algorithm.isv import algorithm from bob.bio.face.preprocessor import TanTriggs from bob.bio.face.config.preprocessor.face_crop_eyes import preprocessor_head preprocessor = TanTriggs(face_cropper=preprocessor_head) sub_directory = 'isv' --- config_mobio.py --- from bob.bio.face.config.database.mobio import mobio_male as database zt_norm = True groups = ['dev', 'eval'] --- config_replay_licit.py --- from bob.bio.face.config.database.replay import replay_licit as database groups = ['world', 'dev', 'eval'] skip_kmeans = True skip_gmm = True skip_isv = True skip_extractor_training = True skip_projector_training = True skip_enroller_training = True --- config_grid.py --- import bob.bio.base # define a queue with demanding parameters grid = bob.bio.base.grid.Grid( training_queue='4G', # preprocessing preprocessing_queue='4G', # feature extraction extraction_queue='4G', # feature projection projection_queue='4G', # model enrollment enrollment_queue='4G', # scoring scoring_queue='4G' )
If you take some time and look at it, you will see that I am taking advantage of both configuration files and
--options
to run several different experiments.Edited by Amir MOHAMMADIYes, the documentation needs to include all this. Also, the
baselines.py
script (which is defined inbob.bio.face
, AFAIR) needs to be outdated and replaced by default configuration files. I hope that I find some time for that this week.@tiago.pereira this looks OK, but I would put some more common options (like
verbose
,groups
) a little bit higher. I will have a look to your implementation and propose something...@amohammadi I think, this is the way that the command line is supposed to work. BTW: Did you know that you can have resources inside your config file? For example, it is safe to write:
preprocessor = 'tan-triggs' algorithm = 'isv'
instead of mentioning the configuration files. Also, the
world
group is automatically added when any tool needs training. Adding it manually does not make much sense / is not required.BTW: Did you know that you can have resources inside your config file? For example, it is safe to write:
Yes but the problem with that is those are entry points and can be replaced by other packages. Importing them is more verbose and robust.
Also, the world group is automatically added when any tool needs training. Adding it manually does not make much sense / is not required.
Some of my databases have enroll and probe in the
world
group and I want their scores too.I have implemented a better split for the options. Unfortunately, this only works for options defined in
bob.bio.base
, but not for the additional options defined inbob.bio.gmm
, e.g.: https://gitlab.idiap.ch/bob/bob.bio.gmm/blob/master/bob/bio/gmm/tools/command_line.py#L14 These parameters are always listed last. I could hack together something to add these options to where they should belong, but this would most probably include global variables, and communication in between the packages...Let me give it a try...
added 1 commit
- ff77c3c1 - Made different lists specifiable by dependent packages (e.g. bob.bio.gmm)
mentioned in issue bob.bio.face#13 (closed)
assigned to @mguenther
mentioned in commit 0c672d5e
mentioned in issue #78 (closed)