New place to put extactors

Hey @heusch,

I would suggest to make the package structure more generic. As one can conclude from the names, currently it is limited to Extractors to be applied on images. But:

Input can be anything, not just images. So folder image is unnecessary from my point of view, unless you really want to separate extractor for different types of samples.
We have three general blocks in bob pipe-lines: preprocessor, extractor and algorithm. Any of these can be PyTorch based. Would be good to have folders for those as-well.

Thanks!

In my view having separate folders for image and audio is suitable architecture. Handling variable length input is important in audio, however is not critical in image. In addition, these extractors (image, audio) barely can be used interchangebly. Currently, I implemented audio embedding extractor as preprocessor block (as in this block we have access to raw data). Should we call them extractor in higher level or having separate folder for preprocessor and extractor?

@ssarfjoo, OK to have have image, audio, etc. folders if you believe it will simplify understanding.

For preprocessor, extractor, algorithm I would suggest we stick to Bob terminology and have different directories, because those have different parent classes. For example I already have an MLP algorith.

That's okay for me. @heusch what do you think about it?

Hi guys,

To address the two points mentioned by @onikisins :

I would also prefer to have separate folders for image / speech / {insert_whatever_here}. As Saeed said, I think it would help categorize stuff
Agreed on bob terminology, but this is relevant only if you use your extractor and /or algorithm within the bob.bio or bob.pad framework (i.e. pretraining a model to be used later for instance does not apply). Also, I think that preprocessor should not be placed here, but that's my opinion ... It's hard to say if preprocessor are "data" specific (i.e. for preprocessing a face) or "architecture" specific (i.e. a particular architecture expect a specific format as input) ...

As I investigated, currently for audio datasets, preprocessor is "data" specific which input of this block is sampling rate and raw speech data which will be loaded from Database interface. If we remove preprocessor from bob.learn.pytorch, what can be the alternative architecture? E.g., is this possible to have access to raw data in extractor level without making redundant temp data? In current architecture of preprocessor in Bob, the raw speech will be copied to the output of preprocessor which is not optimized implementation.

is this possible to have access to raw data in extractor level without making redundant temp data

Yes it is: just provide the preprocessed directory and set the skip_preprocessing flag to True in either command-line or config file when running verify.py or spoof.py. Does that answer your question ?

I think that your preprocessor should be located either in bob.learn.pytorch, bob.bio.spear or bob.pad.voice, or even in your project directory ... This is a tricky question, since I actually don't know what is considered generic enough to be embedded in bob.bio.spear or bob.pad.voice ...

As this preprocessor is relevant to extractor this is better to be in bob.learn.pytorch. So we must have preprocessor folder here too. Is this true to say in this condition read_data function of preprocessor instead of reading bob.io.base.HDF5File from preprocessed directory must read the raw data from Database? And usually write_data function shouldn't do anything.

Guys, if you want my opinion, do not use bob.bio.base or bob.pad.base classes to implement your extractors/algorithms etc. Write them in their own classes in a way that you can easily use them as preprocessor or extractor. For an example, see: https://gitlab.idiap.ch/bob/bob.ip.tensorflow_extractor/blob/d491c9833eff2e368aba03ffacb81aee089f8658/bob/ip/tensorflow_extractor/FaceNet.py#L47

    To use this class as a bob.bio.base extractor::

        from bob.bio.base.extractor import Extractor
        class FaceNetExtractor(FaceNet, Extractor):
            pass
        extractor = FaceNetExtractor()

This way, you can use them as a pre-processor, extractor, or maybe an Algorithm. Depends on your preference. The FaceNet above can also be a preprocessor using our CallablePreprocessor

This is what I am saying. Main function is implemented in preprocessor and extractor just pass it.

Ok then, so feel free to implement it in a new preprocessor folder. You may want to create a new branch to work on that. Thanks

added 2 commits

922a147a - [extractor] added LightCNN9 extractor
5761f2c0 - [extractor] added LightCNN based extractors, and corresponding unit tests

Compare with previous version

added 1 commit

c5ebec5d - added bob.bio.base in both requirements and conda recipe (needed for extractors)

Compare with previous version

@ssarfjoo I'm going to merge this, so feel free to checkout the master once it is merged, and start your branch from there (or checkout what you may use/need).

Thanks

unmarked as a Work In Progress

changed the description

merged

mentioned in commit 9c130f65

New place to put extactors

Merged by Guillaume HEUSCH Feb 6, 2019 (Feb 6, 2019 9:30am UTC) Feb 6, 2019

Activity

New place to put extactors

Merge request reports

Merged by Guillaume HEUSCH Feb 6, 2019 (Feb 6, 2019 9:30am UTC) Feb 6, 2019

Activity