Suggests configuration file should be the default "way" with `verify.py`
In this merge request, the idea is we emphasize people should rather use configuration files with verify.py
. The change is minor, but allows users to run the script issuing:
$ ./bin/verify.py -vv <my-config>
I mark it as a work-in-progress, because I still wanted to have <my-config>
to optionally be a resource. @mguenther: could you please make so? I'm not sure I'm capable of doing this with ruining something else. This will allow us to distribute baselines inside packages, w/o requiring users to know where they are installed.
Merge request reports
Activity
First of all, I am not sure, if it is different to have an option or an optional parameter. As the list of command line options of
verify.py
can be long, it might be difficult to spot the parameter, while for the option , we can easily spot it as it always comes after the-c
option. Also, I don't really like mixing options and parameters in the first place. Anyways, I am not sure how to use optional parameters inargparser
. I have never used those -- I tried once and failed miserably.I, however, agree to motivate users to use the
--configuration-file
/-c
option. But this should, IMHO, be done within the documentation of theverify.py
script, and there I see little difference between telling them to use the the-c
option or to use it as an argument.The second point, having a configuration file as a resource, might require larger modifications to the library. For now, all registered resources come from inside a configuration file, but you would want to have the complete configuration file as one option. Indeed, it would be nice to be able to get rid of the
baselines.py
script -- it was making me headaches from the beginning and its implementation is far from being perfect. Let me just give myself (or whoever feels brave enough to tackle this issue) some pointers:- Loading resources (one from inside a configuration file) is done here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/utils/resources.py#L70, particularly in this line: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/utils/resources.py#L116
- Reading configuration files is done here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/utils/resources.py#L25
This can be done in two ways:
- specifying a
keyword
(like'algorithm'
), it will load the configuration file and return itpreprocessor
content; example usage (indirectly throughload_resource
): https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L162, i.e., here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/utils/resources.py#L106 - without
keyword
(None
), it will load the configuration file and return its full content in a ``namespace`: as currently used here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L220 and here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/script/grid_search.py#L409
- specifying a
So, the solution would probably need to generate a combination of these two, i.e., load a resource using
load_resource
(for which we will need to have a specialkeyword
), and then callload_config_file
withoutkeyword
(see 2.) to get the contents of the configuration file.Well, here is the issue: As of today, the
verify.py
command is a bit overloaded. It handles configuration for the whole baseline. While that makes sense (i.e., having a command that handles everything), it is also a bit overwhelming to see all those options which are just printed without an order or grouping. As myself, people wonder which options affect what? We should strive, through the releases, to make it crystal clear from the documentation, but also from the help message itself.Options vs. arguments: There is a fundamental difference between an option and an argument. Options must be optional, whereas arguments should not. If you use the command
ls
, you'll understand what I mean. In fact, most unix commands follow this philosophy. I thought we just do the same ;-)Unfortunately, as of today, there are non-optional options on the command-line, which is a bit non-sensical. Ideally, options which are non-optional should be made arguments and the remaining options which are truly optional should be grouped, if possible, so a little bit of sense is provided from the command line help utility. A section with examples (epilog) would also be welcome. Migrating it to use
docopt
/schema
instead of the oldargparse
would be a bonus.The question is - how do we improve this w/o breaking the existing command line? I thought one nice way would be to drive people away from the command-line options and provide more support with configuration files. Now, should configuration files be options or arguments? Because this is how people should use the system, it must be made an argument. It makes no sense to run a
verify.py
w/o one. To mark that, we make it an argument. However, if we make it a non-optional argument, then we break the current command-line usage. Therefore, I implemented it in a backwards-compatible way, yet with the philosophy as per above.OK, I see. I don't really mind, if we use the
configuration_file
as an option or as a parameter. However, in your current way of speaking, it is actually an option, as it factually is optional, the same way as all the other options, which are optional by now (previously, there were 5 non-optional options, such as--database
or--algorithm
). Note that the options are already grouped.I agree that an epilog with examples is very useful. On the other hand, I am not a big fan of
docopt
-- I thinkargparse
is the successor ofdocopt
for a good reason. It makes specifying default options and choices and so on so much easier. Also, remember that we actually extend/overwrite the argument parser options, e.g., here: https://gitlab.idiap.ch/bob/bob.bio.gmm/blob/master/bob/bio/gmm/script/verify_gmm.py#L32 and we automatically add command line options here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L212 I am not sure howdocopt
orschema
would allow to do that this easily.I will try to find some time to look into having configuration files as resources next week.
It is an option today because I don't want to break the current usage. I took the safer road, but I prepared the command-line so it is upgradeable in the future once it becomes common practice to use a configuration file.
Another thing we should consider is having one or more
nargs="+"
(instead ofnargs="*"
). This will allow us to group the database + protocol in one configuration and the baseline algorithm in the other, so we can do:$ ./bin/verify.py mobio_male ubm_gmm $ ./bin/verify.py mobio_male eigenfaces $ ./bin/verify.py atnt_idiap eigenfaces
First, I understand that this is optional because you don't want to break the current usage. I would even suggest that it stays optional (either as argument or as option). Though we would encourage people to use a configuration file, the command line is designed in a way that you can specify everything inline, i.e., without requiring to provide any kind of configuration file, for example see here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/test/test_scripts.py#L124 Sometimes it is just faster to provide one of the tools inline, rather than writing it in a configuration file.
Two notes on that:
- Whatever single tool you specify on the command line (rather than in the configuration file) has precedence over what is given on the command line. Hence, you can overwrite the algorithm specified in the configuration file using the
--algorithm
option. So, in order to run a baseline (given in a configuration file) on another database you can simply say:
./bin/verify.py eigenfaces.py --database mobio-male
(the
.py
aftereigenfaces
in the configuration file is still required, but we might be able to remove that when registered as resource, see above). The only thing that you want to have removed is the--database
option, but I think this makes it much more clear that you run theeigenfaces
baseline, but not on the baseline database.- Loading two configuration files at the same time should be relatively simple to do. Simply call https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L158 several times for several configuration files (or later: resources). Be careful to do that in reverse order, so that the second argument will be used, and not the first -- as this is AFAIK the default way things should work when the same parameter/option is specified several times on the command line.
On the other hand, we would need to have two kinds of configuration files registered as resources for everything. For example, each database configuration file (such as: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/setup.py#L87) must be registered as a database, and as a configuration file (the one that I still need to implement). Unless: I find a clever way to combine those two...
- Whatever single tool you specify on the command line (rather than in the configuration file) has precedence over what is given on the command line. Hence, you can overwrite the algorithm specified in the configuration file using the
A configuration file can contain all or part of the information in any case. So:
./bin/verify.py mobio-male eigenfaces
should be enough, given the resource "mobio-male" contains the Mobio database pre-configured for the male protocol and eigenfaces should contain the rest. I'm not sure options make it clearer: in fact it is bit confusing to have a protocol preset on a database, giving you have a
--protocol
option available. I'm trying to think about the final user, and to make it as dead-simple as possible so we can avoid long detailed messages for help on the mailing list.I do like the idea of keeping the command line, with override-able defaults as command-lines. That makes it explicit something is being changed and that is a good practice.
Yet another request: As of today, the resources program does not provide a lot of information about each resource, except for its name and the module it originates from. It would be useful if
./bin/resources.py
showed the documentation for each resource declared, taking into consideration the module's docstring. That would help authors to specify what is in the resource.The full list of resources available in each package should also be made integral part of the documentation, as that determines what exactly is contained in each resource, and how things are configured. This would allow authors to list their documented resources and get a nice doc made for them, with all required details, including access to the source code.
Added 1 commit:
- a0022914 - [config_file] Implement multi-config readout
I just implemented what I had in mind. The changes are backward-compatible, so none of your code regarding the other resources was touched. You can now load configuration files in three different ways that can be mixed and matched as the user wishes. This should work (resource, module name and local file):
./bin/verify mobio-male bob.bio.face.baselines.eigenfaces ./verbose.py
Furthermore, I have patched the documentation and the application
./bin/resources.py
to also listconfig
resources.I have not yet addressed my other request (documentation). Anyways, because resources are proper files, the standard Python parsing should do for the time being.
35 36 Parameters: 37 38 paths (list): A list of resources, modules or files (in order) to collect 39 resources from 40 41 42 Returns: 43 44 module: A valid Python module you can use to configure your tool 45 46 ''' 47 48 def _attach_resources(src, dst): 49 for k in dir(src): 50 dst.__dict__[k] = getattr(src, k) 43 44 module: A valid Python module you can use to configure your tool 45 46 ''' 47 48 def _attach_resources(src, dst): 49 for k in dir(src): 50 dst.__dict__[k] = getattr(src, k) 51 52 import random 53 54 name = "".join(random.sample(ascii_letters, 10)) 55 retval = imp.new_module(name) 56 57 #loads used modules recursively, attach results to module to return 58 if len(paths) > 1: 72 _attach_resources(tmp, retval) 73 return retval 74 except ImportError: 75 #module does not exist, ignore it 76 pass 77 except Exception as e: 78 raise IOError("The configuration module '%s' could not " \ 79 "be loaded: %s" % (paths[-1], e)) 80 81 #if you get to this point, then its not a resource nor a loadable module, is 82 #it on the file system? 83 if not os.path.exists(paths[-1]): 84 raise IOError("The configuration file, resource or module '%s' " \ 85 "could not be found, loaded or imported" % paths[-1]) 86 87 exec(compile(open(paths[-1], "rb").read(), paths[-1], 'exec'), retval.__dict__) Added 1 commit:
- 630867cc - [resources] Avoid exec() as it has bugs on Python 2.7
Added 1 commit:
- 453519ea - Turned recursion over config's into an iteration
@andre.anjos What do you think about 453519ea?
@pkorshunov could you check if changes here will not break
bob.pad.base
?