There has been a test case in bob/bio/base/test/dummy/filelist.py that has been removed during the restructuring. This test case was not only testing the interface to the filelist database (which is now bob.db.bio_filelist), which is non-trivial, but it also served as an example, how to create a filelist database.
With the new database interface, do you think that there is an easy way to use your own (e.g. file list based) database with the bob.bio interface? Shall we re-add a generic BioFileList interface that would make it easy to simply add a user-database based on bob.db.bio_filelist?
The idea is to make it easier for people to create their own file-list-based databases. For people unfamiliar with the Bob database structure it is already a challenge to create their own filelist database. I want to avoid losing those people by enforcing them to add yet another XXXBioDatabase. This might be too difficult to understand. Rather I want to provide them with a generic interface (e.g., what has been the FileListDatabase in the old version of bob.bio.base) which they can simply interface. Understanding this is already difficult enough.
I was not aware that we can use the bob.db.bio_filelist.Database instead of a BioDatabase, as the former does not derive from the latter. The same applies to the File objects, which do not derive from BioFile.
Hence, there is a need for yet another layer -- except if we would change the bob.db.bio_filelist.Database to derive from bob.bio.base.database.BioDatabase. But I think that we wanted to keep the actual database implementation independent of bob.bio so that they can be used without it.
Note that the bob.db.bio_filelist.Database does not even derive from bob.db.base.Database, a fact that should be thought about, too.
I think this one slipped through during refactoring. The idea was that the databases should have a low-level database interface (that was why bob.db.bio_filelist was named like that and not bob.bio.db_filelist) and a separate high-level implementation that uses the low-level.
So we discussed this internally with @andre.anjos@pkorshunov. Here is what we decided that should be done and I will also explain why and let us know if it make sense and we should do it:
Move basic functionality of bob.db.bio_filelist to bob.db.base so that we can create these filelist databases for pad and diarization frameworks easier later.
Move only the implementation of BioDatabase and ZTBioDatabase into bob.bio.db (we will bring back bob.bio.db) but keep the database specific implementations where they are.
bob.bio.db will be a very small package and we will import BioDatabase and ZTBioDatabase in bob.bio.base so that current imports and databases will not break.
We will create a class called FileListBioDatabase in bob.bio.db which contain the rest of bob.db.bio_filelist which are biometric specific and inherits from ZTBioDatabase.
Filelist databases (like bob.db.voxforge) will depend on bob.db.base and bob.bio.db to implement themselves. (See we are bringing back bob.bio.db so that databases do not depend on bob.bio.base)
These package database like bob.db.voxforgewill depend on bob.bio.db which is high-level but these databases only contain a filelist which are specific to biometric too and since they are task specific that should be ok.
Filelist Databases should at least have a config file which will be an instance of bob.bio.db.FileListBioDatabase and returns bob.bio.modality.ModalityBioFilefiles.
This will make it super easy still to create a list of files and one config file to create databases.
An example of a config only database without any package and low-level would be:
from bob.bio.db import FileListBioDatabase
from bob.bio.spear.database import AudioBioFile
timit_database = FileListBioDatabase(name='timit', base_dir='/path/to/timit/file/list', biofileclass=AudioBioFile)
* An example of a database with a package and a somewhat low-level interface (like `bob.db.voxforge`) would be: ```python# low-level inside the package in query.pyfrom bob.bio.db import FileListBioDatabase, BioFileclass VoxforgeDatabase(FileListBioDatabase): def __init__(self, biofileclass=BioFile, **kwargs): # call base class constructor from pkg_resources import resource_filename lists = resource_filename(__name__, 'lists') super(VoxforgeDatabase, self).__init__('voxforge', lists, biofileclass, **kwargs)voxforge_low_database = VoxforgeDatabase()
```python
high-level of it
from bob.bio.spear.database import AudioBioFile
from bob.db.voxforge import VoxforgeDatabase
voxforge_high_database = VoxforgeDatabase(biofileclass=AudioBioFile)
Move basic functionality of bob.db.bio_filelist to bob.db.base so that we can create these filelist databases for pad and diarization frameworks easier later.
Too soon to say it. No one, so far, thought about how to organize diarization databases.
Move only the implementation of BioDatabase and ZTBioDatabase into bob.bio.db (we will bring back bob.bio.db) but keep the database specific implementations where they are.
bob.bio.db will be a very small package and we will import BioDatabase and ZTBioDatabase in bob.bio.base so that current imports and databases will not break.
Two things to say about this:
Another component to maintain
This will be another layer of abstraction and will complicate the things even more for newcomer (which already is, I can tell).
We will create a class called FileListBioDatabase in bob.bio.db which contain the rest of bob.db.bio_filelist which are biometric specific and inherits from ZTBioDatabase.
Basically will contain all the core bob.db.bio_filelist because this package is already task specific.
Filelist databases (like bob.db.voxforge) will depend on bob.db.base and bob.bio.db to implement themselves. (See we are bringing back bob.bio.db so that databases do not depend on bob.bio.base)
These package database like bob.db.voxforgewill depend on bob.bio.db which is high-level but these databases only
I don't see this simpler that we have now.
I liked the design that you proposed, but I think we can reach the same result with the things that we have now.
Independent of the direction that we'll take, let's keep in mind that there are others using these components.
Drastic changes (this will be a 4.x) introduce noise to everybody.
We just came from the refactoring.
I can understand your way of thinking, and I see what you are trying to do there. You want to be able to have specialized XXXBioFile's inside the FileListBioDatabase. But I am afraid that we -- once again -- add a level of abstraction (low-level FileListDatabase and high-level FileListBioDatabase), which might be difficult to understand and, hence, needs very good documentation.
Especially, when a user wants to have their own file-list-based databases, they have to implement two databases (as far as I understood), while my solution implemented in !52 (closed) would need only one database. I see that my solution does not take care of different BioFiles, but this might be easily added. Also, I know that I am replicating the parameters of the bio_filelist.Database constructor, which is not the perfect solution in my eyes. Anyways, I want to keep it as simple as possible. Adding two layers of abstraction is not simple -- in my eyes.
Also, my solution does not require to revive bob.bio.db, a concept that I haven't understood from the beginning. In my view, there is absolutely no need for such a package. You might want to move stuff from bob.bio.base to bob.db.base, but having -- yet another -- layer of abstraction, i.e., bob.bio.db does not seem reasonable to me. And it does not seem to be reasonable for you either: you have been removing my old level of abstraction (bob.db.verification.utils), which was implementing basically what you want in bob.bio.db, while I think your current solution (w/o bob.bio.db) works just fine.
Let me add the BioFile parameter to the current FileListBioDatabase constructor in !52 (closed), and you can check if this fits your needs.
But I am afraid that we -- once again -- add a level of abstraction (low-level FileListDatabase and high-level FileListBioDatabase), which might be difficult to understand and, hence, needs very good documentation.
There can never be a low-level FileListDatabase, file list based databases are so task specific. Look at my timit example. I had to create something like low-level for bob.db.voxforge since it already had a package.
Especially, when a user wants to have their own file-list-based databases, they have to implement two databases (as far as I understood)
No, again look at my timit example. A config file is all that they have to do.
@pkorshunov has done the implementation details that I explained here. He will open a pull soon. Your feedback is appreciated.
@pkorshunov Could you look into this problem too: bob#235 (closed) and make sure it does not happen at least in filelist based databases.
@mguenther my timit example is just what I imagined it would be after we implement these things, it was not supposed to work. Also what @pkorshunov did is very similar to yours but it will delete the bob.db.bio_filelist package which when discussed offline with @andre.anjos and @pkorshunov was the best solution we thought of because filelist based databases are inherently high-level only databases so there is no point in keeping the filelist implementation separate from its framework (bob.bio.base here).
@amohammadi: I agree, it makes total sense to merge bob.db.bio_filelist into bob.bio.base.database. I didn't really like that fact that parts of my parameters was forwarded to the base class (in bob.bio.base), and some where used in bob.db.bio_filelist.
I was just not sure if the bob.db.bio_filelist would be useful in other, non-biometrics-related applications. But as its name already suggests, it is not :-)
Let's do it this way: I will let @pkorshunov open a PR and review it, making possible suggestions how to improve it.
Just one footnote that I have just realized: When we remove the bob.db.bio_filelist interface, any pure filelist-based database (such as bob.db.nist_sre12 has been) will depend on bob.bio.base.
I know that the third option would be preferable, but as I don't possess the original images and protocol files, I would be unable to test such a protocol...
I think if you are going to maintain it, add it in bob.bio.face. Otherwise, create a separate package for it.
if I should generate a separate database package, which would depend on bob.bio.base instead?
That's why we want to have bob.bio.db so that if somebody is creating a database, they would not have to bring dependencies of bob.bio.base. Instead, it would just depend on bob.bio.db.
So, what would be the difference between bob.bio.db and the already existing bob.db.bio_filelist? Both create another layer of confusion for people trying to implement a filelist-based database. And, as you have said yourself, we have been working hard to reduce the number of layers and packages. You are proposing to implement yet another package (bob.bio.db) just in order to remove one (bob.db.bio_filelist)?
The question is, if we need the bob.db.colorferet package to work independently of bob.bio.base. We have just discussed to remove bob.db.bio_filelist since it has no meaning outside of bob.bio. Hence -- according to our reasoning -- a database that uses a file-list has no meaning outside of bob.bio, too. If you disagree with the latter statement, this means that we need to keep bob.db.bio_filelist a separate package. If you agree, I think it is fine to have bob.db.colorferet to be dependent on bob.bio.
As you can see, I am still very much against bob.bio.db.
I have moved bob.db.bio_filelist inside bob.bio.db. I also moved BioDatabase and BioFile from bob.bio.base here as well.
So, what remains is to delete BioDatabase and BioFile from bob.bio.base and point every Bio and Pad (I created similar bob.pad.db as well) databases to bob.bio.db (or bob.pad.db) instead of bob.bio.base(or bob.pad.base)
Sorry, but this looks just like mixing up two different things that have very little in common. I would at least have thought that the BioFileListDatabase would be derived from the BioDatabase, but apparently it is not. Hence, any test on the type of the database will fail.
So, why won't we have a proper implementation of the BioFileListDatabase that has all the contents of the old bob.db.bio_filelist.Database, which directly derives from BioDatabase, i.e., implements all required functions. Then, setting up a database is as simple as providing the directories. This makes it much simpler for people to use the BioFileListDatabase directly, rather than requiring to implement yet another database that derives from BioDatabase and uses BioFileListDatabase.
I am still against bob.bio.db, particularly in its current form, which mixes up things (the BioDatabase interface and the BioFileListDatabase that is independent of BioDatabase) that should not be mixed up in that way. Just put the damn thing into bob.bio.base.
Ok, I can make BioFileListDatabase to depend on BioDatabase. But I did not change almost anything from what is in bob.db.bio_filelist.Database, I just moved it. Where do you see bob.db.bio_filelist.Database to depend on BioDatabase? The only thing I see is that it actually depends on Object:
Sorry, it seems that I was not precise enough. Reading my sentence, I see, how you got confused. The bob.db.bio_filelist.Database did never depend on BioDatabase. I was asking about the new version (currently inside bob.bio.db, hopefully soon in bob.bio.base), why this does not depend on BioDatabase.
@mguenther You know that the main reason for moving all these into bob.bio.db was to avoid databases depending on bob.bio.base? This is the central point here. If we do not care if databases depend on bob.bio.base than there is not need for bob.bio.db.