Skip to content
Snippets Groups Projects

Resolve "strange behaviour when retrieving objects from bob.bio.base filelist database query"

Closes #59 (closed)

I have implemented a way to create a ListReader object for each protocol. In this way, the #59 (closed) should be solved. However, I have not yet created a test case for that. Just the normal test cases still run (at least on my machine).

@heusch: Could you please test if your problem is solved with this implementation?

There are still several functions that rely on the internally stored self.protocol instead of taking the protocol as a parameter. I think the main reason for that is that some functions overwrite ZTBioDatabase base class functions that usually are not overwritten. Maybe, we can have an optional protocol parameter in these functions and use the same trick as in the others:

if protocol is None:
  protocol = self.protocol

(where self.protocol itself can be None).

@amohammadi: Do you think this is a reasonable approach?

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @mguenther It's working fine, thanks !

    Edited by Guillaume HEUSCH
  • @mguenther Thank you for fixing this. Would it be possible to have a test case for this bug too? (having a test filelist database with two protocols)

    Also, I thought I fixed the usage of self.protocol with lines like this: https://gitlab.idiap.ch/bob/bob.bio.base/blob/f7cb7c9fef60915e909e550b63b8cf154b17416b/bob/bio/base/database/filelist/query.py#L264 but I still see some methods like this: https://gitlab.idiap.ch/bob/bob.bio.base/blob/f7cb7c9fef60915e909e550b63b8cf154b17416b/bob/bio/base/database/filelist/query.py#L379 that make use of self.protocol.

    I don't quite understand fully how this filelist based interface is working so if you want to refactor/modify it please go ahead.

    Please update the documentation accordingly too.

    Edited by Amir MOHAMMADI
  • mentioned in issue #59 (closed)

  • Author Maintainer

    I think the main reason, why we see different handling of the protocol parameter and self.protocol is that this database is a mixture of two. First, the old bob.db.verification.filelist.Database required to have self.protocol stored internally, in order to query the database in a reasonable way. Second, it implements functions of the BioDatabase, where the protocol is given as a parameter.

    In fact, the BioDatabase base class has the same issue. It stores self.protocol internally: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/database/database.py#L106 and then has (abstract) functions, where the protocol is passed as a parameter, e.g.: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/database/database.py#L283

    @amohammadi I think you had a good reason to have the interface designed that way. I didn't like that implementation for exactly the reason that we have here: If we need to re-implement more than only the abstract functions, we would see this mixture of protocol and self.protocol.

    So, the best solution would be to refactor the BioDatabase interface to never require the protocol to be passed as a parameter, but always use self.protocol, even in the derived XXXBioDatabase classes. However, this would -- once again -- be a major refactoring, including updating all derived XXXBioDatabase's, which I don't see happening again. Hence, I assume that we now have to live with this mixture of protocol parameters and self.protocol member variables.

    I will add a test case using two different protocol directories.

  • added 1 commit

    • 21954612 - Implemented better behavior when protocol is present; added possibility to only have 'dev' group

    Compare with previous version

  • Author Maintainer

    I have found some more issues with the protocol, which I have now solved.

    I also added the possibility to have only the dev set available, which was not possible before -- at least it raised exceptions when it tried to access the eval files when the default groups=None was specified. This is now fixed, by limiting the groups() to the existing groups/subdirectories.

    A new test case is added, too, so now the same database object can be queried with different protocol's. Still, there are some functions which do not accept a protocol parameter, see note above. Maybe we can just add an optional protocol parameter and use the same handling if protocol is None: protocol = self.protocol as in the other functions?

    @amohammadi Any objections? Do you think, this would work with the abc.abstractmethod? The interfaces of the base class (BioDatabase) and the derived class (FileListBioDatabase) functions would be (slightly) different...

  • added 1 commit

    Compare with previous version

  • So, the best solution would be to refactor the BioDatabase interface to never require the protocol to be passed as a parameter, but always use self.protocol, even in the derived XXXBioDatabase classes. However, this would -- once again -- be a major refactoring, including updating all derived XXXBioDatabase's, which I don't see happening again. Hence, I assume that we now have to live with this mixture of protocol parameters and self.protocol member variables.

    Why can't we have it the other way around? Meaning: make the FileSelector aware of protocol and change the BioDatabase methods to require protocol and not use self.protocol. All low-level databases (SQL ones) are written in a way to ask for protocol each time they are queried. The high levels (abstract methods) are like that too. I think refactoring bob.bio.base internally would be easier.

    Maybe we can just add an optional protocol parameter and use the same handling if protocol is None: protocol = self.protocol as in the other functions?

    If FileSelector was protocol aware, you could easily do this for both BioDatabase and FileListBioDatabase to never depend on self.protocol.

    Another option would be to just remove the --protocol option from verify.py and use self.protocol always but that may break something.

    @andre.anjos do you have any insights here?

  • Author Maintainer

    In fact, the --protocol command line option only overwrites database.protocol: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L304 It is an easy way to run experiments on a different protocol without requiring to change the configuration file. Hence, the protocol is always used from the BioDatabase, and never stored in the FileSelector. Removing it from the command line would not change anything with respect to this issue.

    I think logically the protocol belongs to the database. So, I would say to use self.protocol instead of passing protocol as a parameter in all functions. I can understand, why you are against this -- in the derived XXXBioDatabase class you will use an internal variable stored in the BioDatabase base class. However, as this variable is not private, I think this is a very valid approach and widely used in object-oriented programming.

    Note that either solution would only require to change the bob.bio packages, there is no need to change any low-level database again.

    @andre.anjos @tiago.pereira: what do you think, which option should be used? Or shall we stay with the mixed implementation that we have right now (and for which I have implemented a fix in this PR)?

  • Nobody seems to care about this issue :)

    @mguenther If you want to change the high-level implementations, you can but I am not willing to do that because it takes a lot of time. There are bob.bio.face,spear,video,vein that you need to touch. Also, when you modify the filelist database, you need to make sure bob.db.voxforge,fargo are still working.

  • Amir MOHAMMADI unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Amir MOHAMMADI mentioned in commit 7439f94f

    mentioned in commit 7439f94f

  • Thank you @mguenther. This merge request is good as is. Let's discuss further refactoring (if you really want to do it) in another issue.

  • mentioned in issue #50 (closed)

  • Sorry if I interfere into the discussion.

    I would be fine with using self.protocol internally inside BioDatabase, but I would prefer not to change abstract methods, which are implemented in the high-level database implementations. Abstract methods need to have protocol passed as the parameter. It insures that the person who implements high-level abstraction for the new database package (or a filelist db) knows that protocol is required and the low-level database should return objects depending on that protocol variable. We cannot assume that this person will know anything about self.protocol from the base class and its importance. The database implementations in Bob are already hard enough task, especially for the newcomers, so I would not want to impose on them to know such things as an existence of a global variable of a base class that they must use inside their methods.

    Imposing self.protocol of the base class on the higher level implementations would look like a bad programming practice I'd rather avoid.

    Edited by Pavel KORSHUNOV
  • Author Maintainer

    Thanks, @pkorshunov, for your input. I think you have a good point. People implementing an XXXBioDatabase should not be confronted with such things as they'd need to know the internal structure of the BioDatabase base class.

    Note that your suggestion is exactly what currently is implemented. Although this gives an odd behavior when implementing specialized databases as I have done in this PR. Since I think this should not happen regularly, and should not be required by any external implementor of an XXXBioDatabase, I think we can live with the current solution.

Please register or sign in to reply
Loading