Arrayset new features and improvements
Created by: laurentes
There are two new features that would be nice for the bob.io.Arrayset class.
-
There is currently no way to clear the content of an arrayset. The two options are a. to iterate over and delete the elements one by one or b. to delete the python object. It would be nice to introduce a clear() method.
-
There are only two possible states for an Arrayset:
-
external
where the content is in a file -
inlined
where the content is stored in a set of bob.io.Arrays (as many as the number of samples in the Arrayset). Theinlined
version might introduce a significant overhead when the Arraysets consists of many small Arrays, as in this case, there are as many Array headers as samples. One solution would be to have a third state which stores the sample in a single Array, and where samples are obtained by slicing this Array over the third dimension.
Below is an example to highlight the problem:
Firstly, we allocate 50 samples of dimensions 1024*1024 double's, that is 400MBytes overall.
import numpy, bob
A=bob.io.Arrayset(numpy.random.rand(50,1024*1024))
The memory usage reported (using command line tool "top") is roughly 400MB.
Secondly, we allocate 1024*1024 samples of dimensions 50 double's, that is 400MBytes overall as well.
import numpy, bob
A=bob.io.Arrayset(numpy.random.rand(1024*1024,50))
The memory usage reported is about 800MB, which means an overhead of 100%. In particular, this occurs in common UBM/GMM experiments. Furthermore, it would be nice to find a fix.
Any suggestion is welcome.