Do not cache data in DelayedSample
This is important as loading DelayedSamples and stacking them in SampleBatch will lead to the data being kept in the memory twice. For example, see:
import bob.pipelines as mario import numpy as np from functools import partial a = np.zeros((1000, 1000)) def load(i): # normally we load an array from disk return a[i] samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))] samples[:2] # [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)), # DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))] a2 = np.array(mario.SampleBatch(samples)) np.shares_memory(a, a2) # False
so you can see that SampleBatch always leads to a copy of data and caching data in delayed samples always leads to doulbe memory usage.