Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.io.base bob.io.base
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.io.basebob.io.base
  • Issues
  • #20
Closed
Open
Issue created Jun 19, 2018 by Manuel Günther@mguentherMaintainer

HDF5 contents not available until file is actually closed

This is more of a question rather than a bug, but it might still result in an action.

I have a process that writes a large HDF5 file using bob.io.base.HDF5File. I want to check with an external tool (e.g., h5ls) which entries already have been written into the file. Unfortunately, h5ls does not work, it shows me the error unable to open file, although the file on disk is already several gigabytes large.

I have written a small test code that shows a possible solution, i.e., using the flush operation:

import bob.io.base
import subprocess

h = bob.io.base.HDF5File("test.hdf5", 'w')

h.set("Data", range(10))
print("Before flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.flush()
print("After flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.set("Data2", range(10,20))
print("After new data add:")
subprocess.call(['h5ls', 'test.hdf5'])
del h
print("After delete:")
subprocess.call(['h5ls', 'test.hdf5'])

The output is:

Before flush:
test.hdf5: unable to open file
After flush:
Data                     Dataset {10}
After new data add:
Data                     Dataset {10}
After delete:
Data                     Dataset {10}
Data2                    Dataset {10}

Hence, to be able to see the contents of the file via h5ls (or similar), we need to flush the content. My question would be: should we automatically flush after we have added/changed the contents of the file? Is there any reason (for example that the flush operation might be expensive) not to flush every time @andre.anjos?

Assignee
Assign to
Time tracking