vstack_features is not optimized
I checked the time for loading data via vstack_features
and directly from bob.io.base.HDF5File
. Direct loading was 19.3 times faster than vstack_features
. The code for testing is shown below:
import functools
import numpy as np
import time
import bob.bio.gmm.script.verify_ivector
import bob.bio.base
from bob.bio.gmm import tools, algorithm
from bob.bio.base import tools as base_tools
from bob.bio.gmm.script.verify_ivector import parse_arguments, execute
from bob.bio.gmm.tools.utils import read_feature
from bob.bio.base.utils.io import vstack_features
from bob.bio.base.tools.FileSelector import FileSelector
command_line_parameters = None
args = parse_arguments(command_line_parameters)
fs = FileSelector.instance()
limit_files = 100
reader = functools.partial(read_feature, args.extractor)
training_list = fs.training_list('extracted', 'train_projector')
training_list = training_list[:limit_files]
t1 = time.time()
data = vstack_features(reader, training_list, allow_missing_files=True)
t2 = time.time()
print('vstack time: ' + str(t2-t1))
t3 = time.time()
data2 = []
for i in range(len(training_list)):
f = bob.io.base.HDF5File(training_list[i])
data2.append(f.read('array'))
data2 = np.array(data2)
t4 = time.time()
print('direct time: ' + str(t4-t3))
and the output is:
vstack time: 5.793696880340576
direct time: 0.30901122093200684
Even for caching I tested them separately and the result was the same.