Do not cache data in DelayedSample
This is important as loading DelayedSamples and stacking them in SampleBatch will lead to the data being kept in the memory twice. For example, see:
import bob.pipelines as mario
import numpy as np
from functools import partial
a = np.zeros((1000, 1000))
def load(i):
# normally we load an array from disk
return a[i]
samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))]
samples[:2]
# [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)),
# DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))]
a2 = np.array(mario.SampleBatch(samples))
np.shares_memory(a, a2)
# False
so you can see that SampleBatch always leads to a copy of data and caching data in delayed samples always leads to doulbe memory usage.
Edited by Amir MOHAMMADI