Skip to content
Snippets Groups Projects

Load score more efficiently for negatives and positives

Closed Amir MOHAMMADI requested to merge minimal_load into master

Fixes #19 (closed)

@mguenther I think this is better than what you are trying to do. What do you think?

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I will give this a try on Monday.

    But, as I mentioned in #19 (closed), this is not the minimal way to load scores, it still required to keep both real_id and claimed_id of all score pairs in memory at the same time.

  • OK, I have done some profiling of the three alternatives of loading and splitting score files. I have used https://pypi.python.org/pypi/memory_profiler to do the memory and time profiling.

    I have a larger score file with eight million scores (219 MB raw file):

    $ wc -l scores-dev 
    8012444 scores-dev

    I have written a short script to load the score file:

    import bob.measure
    negatives, positives = bob.measure.load.split_four_column(score_file)

    and I have run the script with all three branches: master, minimal_load and 19-load_scores-extremely-memory-hungry, using:

    $ bin/mprof run -T 1 ./bin/python load_scores.py
    $ bin/mprof plot -no {master,minimal,mine}.pdf

    The resulting plots are attached.master.pdfmine.pdfminimal.pdf

    As you can see, the minimal_load branch and the master branch need approximately the same amount of memory (3GB vs. 3.5GB), while mine tales 300MB. Also note the time differences (x-axes): master 180 sec., minimal_load: 140 sec, mine: 80 sec. Note that all experiments are run on a local disk, i.e., to avoid network latencies in loading the score file.

    @amohammadi Do you now agree that my version works better? Can we close this PR?

  • Amir MOHAMMADI Status changed to closed

    Status changed to closed

  • ok @mguenther go ahead and create a pull for your implementation please. But, as I said before, I make use of these methods elsewhere so I don't want their API (their call signature and their return value format) to be changed.

  • mentioned in merge request bob.bio.base!250 (merged)

Please register or sign in to reply
Loading