Included metadata during the feature extraction.
Hi guys,
I have a feature extractor that needs to use some meta information that is stored in bob.bio.base.database.BioFile
.
In this MR, I added in the preprocessor.call and extractor.call a keyword called metadata that will ship an instance of bob.bio.base.database.BioFile
.
Some design decisions, that will may generate discussion, are commented in the MR diff. Please have a look there.
Extending this to other elements in the toolchain (enrolling, scoring), could provide a solution to !111 (closed).
@mguenther, do you have some time to review this one? Thanks in advance.
Merge request reports
Activity
- Resolved by Manuel Günther
added 1 commit
- fa41c3ca - Included metadata durint the feature extraction. Ongoing with cd workspace_HTFace/
mentioned in merge request !111 (closed)
added enhancement label
assigned to @mguenther
Just some extra info @mguenther; our Mac mini died 2 days ago, so that's why the mac builds are stuck.
Rescue measures are being taken to save its life, but so far so bad :-(
@amohammadi I think we could also use parallel preprocessors for this, but the solution of @tiago.pereira is small enough.
@tiago.pereira Is there a reason why you have implemented this only for preprocessors and extractors, and not for algorithms? For example, the
enroll
function might obtain a list ofBioFile
's, and theproject
function might also want to use information from theBioFile
.Also, now that we have this new feature, we need to update the documentation. As you mentioned yourself, this will introduce some noise, which needs to be documented properly. Otherwise no-one will ever know that this feature exists.
assigned to @tiago.pereira
@tiago.pereira Is there a reason why you have implemented this only for preprocessors and extractors, and not for algorithms? For example, the
enroll
function might obtain a list ofBioFile
's, and theproject
function might also want to use information from theBioFile
.No, there's no reason, it's just a matter of time to implement it. I'll do this in this MR.
Also, now that we have this new feature, we need to update the documentation. As you mentioned yourself, this will introduce some noise, which needs to be documented properly. Otherwise no-one will ever know that this feature exists.
Yes, now that I have some support for the feature, I will append this to the documentation.
Thanks for looking at it.
After giving this some thought I think the biggest issue with this is that it can easily lead to incorrect toolchains. If you have access to the class of the samples all the time, there is nothing stopping users from misusing this; be it on purpose or unintentional. It was only a few weeks ago here that one of the postdocs here was training two different PCAs for its two classes (PAD) in the extraction step and he was getting perfect results :) This happened because he was hacking around the designed toolchains. Running verify.py two times and copying files around by hand.
I don't know right now how can you cheat in the toolchain if you have access to the identities in biometric recognition experiments. I am sure some users will misuse this if it is easily available as metadata. One of the strong points of bob.bio.base is that it makes it almost impossible to do a wrong experiment and I am not sure if this merge request is going towards that direction.
Hey @amohammadi,
My motivation to create this feature was to have access to the image modality (VIS, NIR, Sketch, Thermal, etc...) during the feature extraction. As far as I remember, @vkrivokuca needed to have access to the client id during the enrolment in order to apply some template protection strategy (forgive me if it is not correct). Both motivations are clean and honest.
I understand your concern and yes misuse is totally possible with or without this feature, but I think the role of bob.bio.base is not to prevent that.
The best way to prevent that is the sunlight. "Sunlight is said to be the best of disinfectants"; for that reason I try to share my code with others as much as possible. I don't have secrete packages with secrete results, either they are public to the world or they are public to Idiap. For the same reason our group work towards public software.
Working in a group is the best way to deal with those issues. That's what I think.