HDF5 contents not available until file is actually closed
This is more of a question rather than a bug, but it might still result in an action.
I have a process that writes a large HDF5 file using bob.io.base.HDF5File
. I want to check with an external tool (e.g., h5ls
) which entries already have been written into the file. Unfortunately, h5ls
does not work, it shows me the error unable to open file
, although the file on disk is already several gigabytes large.
I have written a small test code that shows a possible solution, i.e., using the flush
operation:
import bob.io.base
import subprocess
h = bob.io.base.HDF5File("test.hdf5", 'w')
h.set("Data", range(10))
print("Before flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.flush()
print("After flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.set("Data2", range(10,20))
print("After new data add:")
subprocess.call(['h5ls', 'test.hdf5'])
del h
print("After delete:")
subprocess.call(['h5ls', 'test.hdf5'])
The output is:
Before flush:
test.hdf5: unable to open file
After flush:
Data Dataset {10}
After new data add:
Data Dataset {10}
After delete:
Data Dataset {10}
Data2 Dataset {10}
Hence, to be able to see the contents of the file via h5ls
(or similar), we need to flush
the content. My question would be: should we automatically flush after we have added/changed the contents of the file? Is there any reason (for example that the flush
operation might be expensive) not to flush
every time @andre.anjos?