load_scores extremely memory hungry
The new implementation of score loading is memory hungry, as it stores the whole score file in memory. For large score files that have long client_id
's and label
's, this might easily be too much for a normal desktop machine.
To split the score file into positives and negatives, most of the information (for example, the label
s) is completely irrelevan.
I remember that I have had this problem with an older version of bob.measure
, and this is why I have implemented the score reading using a generator function (i.e., yield
'ing the file line by line) instead of keeping all information of the score file at the same time.
I will provide a better alternative of the 'load_scores' function as a generator function, which does not store the whole score file in memory.