Propagated --allow-missing-files to the UBM training
Solves #21 (closed)
Merge request reports
Activity
- Resolved by Manuel Günther
mentioned in merge request bob.bio.base!103 (merged)
added 1 commit
- 19822e24 - Propagated --allow-missing-files to the UBM training
mentioned in merge request bob.bio.base!104 (merged)
added 1 commit
- f4e6b484 - Propagated --allow-missing-files to the UBM training
added 1 commit
- 89ba845e - Propagated --allow-missing-files to ISV and iVector training and created tests for all combinations
It's ready to be merged.
I propagated this flag to the
gmm
isv
andivector
trainers (sequential and parallel).Do you mind to review this one @mguenther? No rush.
Thanks
assigned to @mguenther
- Resolved by Tiago de Freitas Pereira
Why didn't you use the
filter_missing_files
function from here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/8d7a645cb8e419da36304dbdac7b50d93af299a8/bob/bio/base/utils/io.py#L14 which is used for example here: https://gitlab.idiap.ch/bob/bob.bio.base/blob/8d7a645cb8e419da36304dbdac7b50d93af299a8/bob/bio/base/tools/extractor.py#L163I think, this would have avoided to re-implement the
vstack_features
in the first place since the filtering would have happened before. Also, I don't think that youris_missing_file
function is really required.Well, I didn't know about the existence of this function, but
is_missing_file
uses the logger in case a file is missing.Maybe we should move this functionality to the function
filter_missing_files
; is that sensible to you ? We may lose in efficiency, but I think it is a good compromise.Thanks
Well, I didn't know about the existence of this function,
It is always good to have a look, how other related functions implement this feature...
but is_missing_file uses the logger in case a file is missing.
As the files are the result of another process (preprocessor, extractor), they have already been reported to be missing: https://gitlab.idiap.ch/bob/bob.bio.base/blob/8d7a645cb8e419da36304dbdac7b50d93af299a8/bob/bio/base/tools/extractor.py#L125 I am not sure if it is required to report them twice.
Maybe we should move this functionality to the function filter_missing_files; is that sensible to you ? We may lose in efficiency, but I think it is a good compromise.
Indeed, I saw a similar test in
preprocess
,extract
, andproject
, e.g.: https://gitlab.idiap.ch/bob/bob.bio.base/blob/8d7a645cb8e419da36304dbdac7b50d93af299a8/bob/bio/base/tools/extractor.py#L106 and related functions inbob.bio.base
. We should move youis_missing_file
to thebob.bio.base.utils
module and use it inside these functions.Bottom-line: when collecting training data, you should use
filter_missing_files
. For other functions likeproject
, we should useis_missing_file
, since these get parallelized and the files might be wrong when we filter them before. As I also log missing files already (cf. link toextract
above) you should keep it there.added 1 commit
- fb3321c3 - Propagated --allow-missing-files to the UBM training
Hey @mguenther, everything is green now. Could you please have a final look in this one?
Thanks