Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.pipelines bob.pipelines
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.pipelinesbob.pipelines
  • Issues
  • #24
Closed
Open
Issue created Nov 18, 2020 by Amir MOHAMMADI@amohammadiOwner

Do not cache data in DelayedSample

This is important as loading DelayedSamples and stacking them in SampleBatch will lead to the data being kept in the memory twice. For example, see:

import bob.pipelines as mario
import numpy as np
from functools import partial

a = np.zeros((1000, 1000))

def load(i):
    # normally we load an array from disk
    return a[i]

samples = [mario.DelayedSample(partial(load, i=i)) for i in range(len(a))]
samples[:2]
# [DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=0)),
# DelayedSample(load=functools.partial(<function load at 0x7fb1c90250d0>, i=1))]

a2 = np.array(mario.SampleBatch(samples))
np.shares_memory(a, a2)
# False

so you can see that SampleBatch always leads to a copy of data and caching data in delayed samples always leads to doulbe memory usage.

Edited Nov 23, 2020 by Amir MOHAMMADI
Assignee
Assign to
Time tracking