Resolve "strange behaviour when retrieving objects from bob.bio.base filelist database query"
Closes #59 (closed)
I have implemented a way to create a ListReader
object for each protocol. In this way, the #59 (closed) should be solved.
However, I have not yet created a test case for that. Just the normal test cases still run (at least on my machine).
@heusch: Could you please test if your problem is solved with this implementation?
There are still several functions that rely on the internally stored self.protocol
instead of taking the protocol
as a parameter.
I think the main reason for that is that some functions overwrite ZTBioDatabase
base class functions that usually are not overwritten.
Maybe, we can have an optional protocol parameter in these functions and use the same trick as in the others:
if protocol is None:
protocol = self.protocol
(where self.protocol
itself can be None
).
@amohammadi: Do you think this is a reasonable approach?
Merge request reports
Activity
@mguenther It's working fine, thanks !
Edited by Guillaume HEUSCH@mguenther Thank you for fixing this. Would it be possible to have a test case for this bug too? (having a test filelist database with two protocols)
Also, I thought I fixed the usage of
self.protocol
with lines like this: https://gitlab.idiap.ch/bob/bob.bio.base/blob/f7cb7c9fef60915e909e550b63b8cf154b17416b/bob/bio/base/database/filelist/query.py#L264 but I still see some methods like this: https://gitlab.idiap.ch/bob/bob.bio.base/blob/f7cb7c9fef60915e909e550b63b8cf154b17416b/bob/bio/base/database/filelist/query.py#L379 that make use ofself.protocol
.I don't quite understand fully how this filelist based interface is working so if you want to refactor/modify it please go ahead.
Please update the documentation accordingly too.
Edited by Amir MOHAMMADImentioned in issue #59 (closed)
I think the main reason, why we see different handling of the
protocol
parameter andself.protocol
is that this database is a mixture of two. First, the oldbob.db.verification.filelist.Database
required to haveself.protocol
stored internally, in order to query the database in a reasonable way. Second, it implements functions of theBioDatabase
, where theprotocol
is given as a parameter.In fact, the
BioDatabase
base class has the same issue. It storesself.protocol
internally: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/database/database.py#L106 and then has (abstract) functions, where theprotocol
is passed as a parameter, e.g.: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/database/database.py#L283@amohammadi I think you had a good reason to have the interface designed that way. I didn't like that implementation for exactly the reason that we have here: If we need to re-implement more than only the abstract functions, we would see this mixture of
protocol
andself.protocol
.So, the best solution would be to refactor the
BioDatabase
interface to never require theprotocol
to be passed as a parameter, but always useself.protocol
, even in the derivedXXXBioDatabase
classes. However, this would -- once again -- be a major refactoring, including updating all derivedXXXBioDatabase
's, which I don't see happening again. Hence, I assume that we now have to live with this mixture ofprotocol
parameters andself.protocol
member variables.I will add a test case using two different protocol directories.
added 1 commit
- 21954612 - Implemented better behavior when protocol is present; added possibility to only have 'dev' group
I have found some more issues with the protocol, which I have now solved.
I also added the possibility to have only the
dev
set available, which was not possible before -- at least it raised exceptions when it tried to access theeval
files when the defaultgroups=None
was specified. This is now fixed, by limiting thegroups()
to the existing groups/subdirectories.A new test case is added, too, so now the same database object can be queried with different
protocol
's. Still, there are some functions which do not accept aprotocol
parameter, see note above. Maybe we can just add an optionalprotocol
parameter and use the same handlingif protocol is None: protocol = self.protocol
as in the other functions?@amohammadi Any objections? Do you think, this would work with the
abc.abstractmethod
? The interfaces of the base class (BioDatabase
) and the derived class (FileListBioDatabase
) functions would be (slightly) different...So, the best solution would be to refactor the BioDatabase interface to never require the protocol to be passed as a parameter, but always use self.protocol, even in the derived XXXBioDatabase classes. However, this would -- once again -- be a major refactoring, including updating all derived XXXBioDatabase's, which I don't see happening again. Hence, I assume that we now have to live with this mixture of protocol parameters and self.protocol member variables.
Why can't we have it the other way around? Meaning: make the
FileSelector
aware of protocol and change theBioDatabase
methods to requireprotocol
and not useself.protocol
. All low-level databases (SQL ones) are written in a way to ask forprotocol
each time they are queried. The high levels (abstract methods) are like that too. I think refactoringbob.bio.base
internally would be easier.Maybe we can just add an optional protocol parameter and use the same handling if protocol is None: protocol = self.protocol as in the other functions?
If
FileSelector
was protocol aware, you could easily do this for bothBioDatabase
andFileListBioDatabase
to never depend onself.protocol
.Another option would be to just remove the
--protocol
option fromverify.py
and useself.protocol
always but that may break something.@andre.anjos do you have any insights here?
In fact, the
--protocol
command line option only overwritesdatabase.protocol
: https://gitlab.idiap.ch/bob/bob.bio.base/blob/master/bob/bio/base/tools/command_line.py#L304 It is an easy way to run experiments on a different protocol without requiring to change the configuration file. Hence, the protocol is always used from theBioDatabase
, and never stored in theFileSelector
. Removing it from the command line would not change anything with respect to this issue.I think logically the
protocol
belongs to the database. So, I would say to useself.protocol
instead of passingprotocol
as a parameter in all functions. I can understand, why you are against this -- in the derivedXXXBioDatabase
class you will use an internal variable stored in theBioDatabase
base class. However, as this variable is not private, I think this is a very valid approach and widely used in object-oriented programming.Note that either solution would only require to change the
bob.bio
packages, there is no need to change any low-level database again.@andre.anjos @tiago.pereira: what do you think, which option should be used? Or shall we stay with the mixed implementation that we have right now (and for which I have implemented a fix in this PR)?
Nobody seems to care about this issue :)
@mguenther If you want to change the high-level implementations, you can but I am not willing to do that because it takes a lot of time. There are
bob.bio.face,spear,video,vein
that you need to touch. Also, when you modify thefilelist
database, you need to make surebob.db.voxforge,fargo
are still working.mentioned in commit 7439f94f
Thank you @mguenther. This merge request is good as is. Let's discuss further refactoring (if you really want to do it) in another issue.
mentioned in issue #50 (closed)
Sorry if I interfere into the discussion.
I would be fine with using
self.protocol
internally insideBioDatabase
, but I would prefer not to change abstract methods, which are implemented in the high-level database implementations. Abstract methods need to haveprotocol
passed as the parameter. It insures that the person who implements high-level abstraction for the new database package (or a filelist db) knows thatprotocol
is required and the low-level database should return objects depending on that protocol variable. We cannot assume that this person will know anything aboutself.protocol
from the base class and its importance. The database implementations inBob
are already hard enough task, especially for the newcomers, so I would not want to impose on them to know such things as an existence of a global variable of a base class that they must use inside their methods.Imposing
self.protocol
of the base class on the higher level implementations would look like abad
programming practice I'd rather avoid.Edited by Pavel KORSHUNOVThanks, @pkorshunov, for your input. I think you have a good point. People implementing an
XXXBioDatabase
should not be confronted with such things as they'd need to know the internal structure of theBioDatabase
base class.Note that your suggestion is exactly what currently is implemented. Although this gives an odd behavior when implementing specialized databases as I have done in this PR. Since I think this should not happen regularly, and should not be required by any external implementor of an
XXXBioDatabase
, I think we can live with the current solution.