HDF5 contents not available until file is actually closed
This is more of a question rather than a bug, but it might still result in an action.
I have a process that writes a large HDF5 file using bob.io.base.HDF5File. I want to check with an external tool (e.g., h5ls) which entries already have been written into the file. Unfortunately, h5ls does not work, it shows me the error unable to open file, although the file on disk is already several gigabytes large.
I have written a small test code that shows a possible solution, i.e., using the flush operation:
import bob.io.base
import subprocess
h = bob.io.base.HDF5File("test.hdf5", 'w')
h.set("Data", range(10))
print("Before flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.flush()
print("After flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.set("Data2", range(10,20))
print("After new data add:")
subprocess.call(['h5ls', 'test.hdf5'])
del h
print("After delete:")
subprocess.call(['h5ls', 'test.hdf5'])
The output is:
Before flush:
test.hdf5: unable to open file
After flush:
Data Dataset {10}
After new data add:
Data Dataset {10}
After delete:
Data Dataset {10}
Data2 Dataset {10}
Hence, to be able to see the contents of the file via h5ls (or similar), we need to flush the content. My question would be: should we automatically flush after we have added/changed the contents of the file? Is there any reason (for example that the flush operation might be expensive) not to flush every time @andre.anjos?