Skip to content

Script for feature extraction from database

In many cases, we would just want to have a script to extract the features for all samples of our database (using a specifiable Transformer), so that we can use them in a different process. Currently, there is no such script available.

I would propose to add a script as follows:

import argparse
import os

parser = argparse.ArgumentParser(
    formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    description='Extract features from the given dataset'
)

parser.add_argument("--transformer", "-e", required=True, help="Select the transformer to be used")
parser.add_argument("--dataset", "-d", required=True, help="Select the dataset from which to extract features")
parser.add_argument("--output-directory", "-o", required=True, help="Select the directory where to write the data to")

args = parser.parse_args()

import bob.bio.base
import bob.core
import bob.io.base

logger = bob.core.log.setup("bob.paper.osijbc")
bob.core.log.set_verbosity_level(logger, 2)

database = bob.bio.base.load_resource(args.dataset, "database")
transformer = bob.bio.base.load_resource(args.transformer, "transformer")

for idx, samples in enumerate(database.all_samples()):
    logger.info('Extracting features for sample', )
    features = transformer.transform(samples)

    for feature in features:
        output = os.path.join(args.output_directory, feature.key + ".hdf5")
        logger.debug('Writing file', output)
        bob.io.base.save(feature.data, output, True)

To be consistent with our other scripts, I would recommend to use click instead of argparse. Unfortunately, I am not familiar with click and I have no time to learn how to implement click commands right now. Would anyone else -- with more experience with click take this over?