bob.io.base issueshttps://gitlab.idiap.ch/bob/bob.io.base/-/issues2018-06-20T17:36:07Zhttps://gitlab.idiap.ch/bob/bob.io.base/-/issues/20HDF5 contents not available until file is actually closed2018-06-20T17:36:07ZManuel Günthersiebenkopf@googlemail.comHDF5 contents not available until file is actually closedThis is more of a question rather than a bug, but it might still result in an action.
I have a process that writes a large HDF5 file using `bob.io.base.HDF5File`. I want to check with an external tool (e.g., `h5ls`) which entries alread...This is more of a question rather than a bug, but it might still result in an action.
I have a process that writes a large HDF5 file using `bob.io.base.HDF5File`. I want to check with an external tool (e.g., `h5ls`) which entries already have been written into the file. Unfortunately, `h5ls` does not work, it shows me the error `unable to open file`, although the file on disk is already several gigabytes large.
I have written a small test code that shows a possible solution, i.e., using the `flush` operation:
```
import bob.io.base
import subprocess
h = bob.io.base.HDF5File("test.hdf5", 'w')
h.set("Data", range(10))
print("Before flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.flush()
print("After flush:")
subprocess.call(['h5ls', 'test.hdf5'])
h.set("Data2", range(10,20))
print("After new data add:")
subprocess.call(['h5ls', 'test.hdf5'])
del h
print("After delete:")
subprocess.call(['h5ls', 'test.hdf5'])
```
The output is:
```
Before flush:
test.hdf5: unable to open file
After flush:
Data Dataset {10}
After new data add:
Data Dataset {10}
After delete:
Data Dataset {10}
Data2 Dataset {10}
```
Hence, to be able to see the contents of the file via `h5ls` (or similar), we need to `flush` the content. My question would be: should we automatically flush after we have added/changed the contents of the file? Is there any reason (for example that the `flush` operation might be expensive) not to `flush` every time @andre.anjos?