Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.bio.base bob.bio.base
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 14
    • Issues 14
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.bio.basebob.bio.base
  • Issues
  • #125
Closed
Open
Issue created Jul 06, 2018 by Saeed SARFJOO@ssarfjooDeveloper

vstack_features is not optimized

I checked the time for loading data via vstack_features and directly from bob.io.base.HDF5File. Direct loading was 19.3 times faster than vstack_features. The code for testing is shown below:

import functools
import numpy as np
import time
import bob.bio.gmm.script.verify_ivector
import bob.bio.base
from bob.bio.gmm import tools, algorithm
from bob.bio.base import tools as base_tools
from bob.bio.gmm.script.verify_ivector import parse_arguments, execute
from bob.bio.gmm.tools.utils import read_feature
from bob.bio.base.utils.io import vstack_features
from bob.bio.base.tools.FileSelector import FileSelector

command_line_parameters = None
args = parse_arguments(command_line_parameters)
fs = FileSelector.instance()
limit_files = 100

reader = functools.partial(read_feature, args.extractor)
training_list = fs.training_list('extracted', 'train_projector')
training_list = training_list[:limit_files]

t1 = time.time()
data =  vstack_features(reader, training_list, allow_missing_files=True)
t2 = time.time()
print('vstack time: ' + str(t2-t1))
  
t3 = time.time()
data2 = []
for i in range(len(training_list)):
  f = bob.io.base.HDF5File(training_list[i])
  data2.append(f.read('array'))
data2 = np.array(data2)
t4 = time.time()
print('direct time: ' + str(t4-t3))

and the output is:

vstack time: 5.793696880340576
direct time: 0.30901122093200684

Even for caching I tested them separately and the result was the same.

Assignee
Assign to
Time tracking