Skip to content
Snippets Groups Projects

Included metadata during the feature extraction.

Merged Tiago de Freitas Pereira requested to merge meta-information into master

Hi guys,

I have a feature extractor that needs to use some meta information that is stored in bob.bio.base.database.BioFile.

In this MR, I added in the preprocessor.call and extractor.call a keyword called metadata that will ship an instance of bob.bio.base.database.BioFile. Some design decisions, that will may generate discussion, are commented in the MR diff. Please have a look there.

Extending this to other elements in the toolchain (enrolling, scoring), could provide a solution to !111 (closed).

@mguenther, do you have some time to review this one? Thanks in advance.

Edited by Tiago de Freitas Pereira

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • fa41c3ca - Included metadata durint the feature extraction. Ongoing with cd workspace_HTFace/

    Compare with previous version

  • Tiago de Freitas Pereira changed title from WIP: Included metadata durint the feature extraction. Ongoing with cd workspace_HTFace/ to WIP: Included metadata durint the feature extraction.

    changed title from WIP: Included metadata durint the feature extraction. Ongoing with cd workspace_HTFace/ to WIP: Included metadata durint the feature extraction.

  • Tiago de Freitas Pereira changed the description

    changed the description

  • Tiago de Freitas Pereira changed title from WIP: Included metadata durint the feature extraction. to WIP: Included metadata during the feature extraction.

    changed title from WIP: Included metadata durint the feature extraction. to WIP: Included metadata during the feature extraction.

  • mentioned in merge request !111 (closed)

  • added 1 commit

    • 481affbd - Appended the metadata during preprocessing

    Compare with previous version

  • Tiago de Freitas Pereira unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Tiago de Freitas Pereira changed the description

    changed the description

  • Just some extra info @mguenther; our Mac mini died 2 days ago, so that's why the mac builds are stuck.

    Rescue measures are being taken to save its life, but so far so bad :-(

  • I am not really looking forward to this being merged but I also understand the limitations of the software so I am not against it. Have you considered parallel preprocessors?

  • Hey, what is the matter with this one? It is not unorthodox to deal with metadata.

    I solved with 6 lines of code for each element (extractor and preprocessor) with zero impact to the overall system.

  • Manuel Günther resolved all discussions

    resolved all discussions

  • @amohammadi I think we could also use parallel preprocessors for this, but the solution of @tiago.pereira is small enough.

    @tiago.pereira Is there a reason why you have implemented this only for preprocessors and extractors, and not for algorithms? For example, the enroll function might obtain a list of BioFile's, and the project function might also want to use information from the BioFile.

    Also, now that we have this new feature, we need to update the documentation. As you mentioned yourself, this will introduce some noise, which needs to be documented properly. Otherwise no-one will ever know that this feature exists.

  • @tiago.pereira Is there a reason why you have implemented this only for preprocessors and extractors, and not for algorithms? For example, the enroll function might obtain a list of BioFile's, and the project function might also want to use information from the BioFile.

    No, there's no reason, it's just a matter of time to implement it. I'll do this in this MR.

    Also, now that we have this new feature, we need to update the documentation. As you mentioned yourself, this will introduce some noise, which needs to be documented properly. Otherwise no-one will ever know that this feature exists.

    Yes, now that I have some support for the feature, I will append this to the documentation.

    Thanks for looking at it.

  • After giving this some thought I think the biggest issue with this is that it can easily lead to incorrect toolchains. If you have access to the class of the samples all the time, there is nothing stopping users from misusing this; be it on purpose or unintentional. It was only a few weeks ago here that one of the postdocs here was training two different PCAs for its two classes (PAD) in the extraction step and he was getting perfect results :) This happened because he was hacking around the designed toolchains. Running verify.py two times and copying files around by hand.

    I don't know right now how can you cheat in the toolchain if you have access to the identities in biometric recognition experiments. I am sure some users will misuse this if it is easily available as metadata. One of the strong points of bob.bio.base is that it makes it almost impossible to do a wrong experiment and I am not sure if this merge request is going towards that direction.

  • Hey @amohammadi,

    My motivation to create this feature was to have access to the image modality (VIS, NIR, Sketch, Thermal, etc...) during the feature extraction. As far as I remember, @vkrivokuca needed to have access to the client id during the enrolment in order to apply some template protection strategy (forgive me if it is not correct). Both motivations are clean and honest.

    I understand your concern and yes misuse is totally possible with or without this feature, but I think the role of bob.bio.base is not to prevent that.

    The best way to prevent that is the sunlight. "Sunlight is said to be the best of disinfectants"; for that reason I try to share my code with others as much as possible. I don't have secrete packages with secrete results, either they are public to the world or they are public to Idiap. For the same reason our group work towards public software.

    Working in a group is the best way to deal with those issues. That's what I think.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading