New database interface for PAD
Hi @amohammadi, @ydayer
Follow the proposition for a new DB interface for PAD.
It follows the same guide lines used in bob.bio.base
.
Follow below the features implemented:
- Uses CSV files instead of LSTs; with that, you can ship metadata. However, it uses the same file structure as before, so no stress in porting stuff.
- The CSVPADDataset can transparently read the current LST files we have (I've created a sample loader that handles that).
- The CSVPADDataset is able to read either files inside of a file structure or files inside of a tarball.
Follow an example on how to use it, by reading from a file structure and from a tarball
def run(path):
dataset = CSVPADDataset(path, "protocol1")
# Train
assert len(dataset.fit_samples()) == 5
# 2 out of 5 are bonafides
assert sum([s.is_bonafide for s in dataset.fit_samples()]) == 2
# DEV
assert len(dataset.predict_samples()) == 5
# 2 out of 5 are bonafides
assert sum([s.is_bonafide for s in dataset.predict_samples()]) == 2
# EVAL
assert len(dataset.predict_samples(group="eval")) == 7
# 3 out of 5 are bonafides
assert sum([s.is_bonafide for s in dataset.predict_samples(group="eval")]) == 3
csv_example_dir = os.path.realpath(
bob.io.base.test_utils.datafile(".", __name__, "data/csv_dataset")
)
csv_example_tarball = os.path.realpath(
bob.io.base.test_utils.datafile(".", __name__, "data/csv_dataset.tar.gz")
)
run(csv_example_dir)
run(csv_example_tarball)