Commit 1395d29f authored by Zohreh MOSTAANI's avatar Zohreh MOSTAANI
Browse files

[core][doc] removed io from doc. added to backend docs

parent 8e5b65fe
.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2016 Idiap Research Institute, ..
.. Contact: ..
.. ..
.. This file is part of the beat.core module of the BEAT platform. ..
.. ..
.. Commercial License Usage ..
.. Licensees holding valid commercial BEAT licenses may use this file in ..
.. accordance with the terms contained in a written agreement between you ..
.. and Idiap. For further information contact ..
.. ..
.. Alternatively, this file may be used under the terms of the GNU Affero ..
.. Public License version 3 as published by the Free Software and appearing ..
.. in the file LICENSE.AGPL included in the packaging of this file. ..
.. The BEAT platform is distributed in the hope that it will be useful, but ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. ..
.. You should have received a copy of the GNU Affero Public License along ..
.. with the BEAT platform. If not, see ..
.. _developerguide-io:
.. _developerguide-io-introduction:
The requirements for the platform when reading/writing data are:
* Ability to manage large and complex data
* Portability to allow the use of heterogeneous environments
Based on our experience and on these requirements, we investigated
the use of HDF5. Unfortunately, HDF5 is not convenient to handle
structures such as arrays of variable-size elements, for instance,
array of strings.
Therefore, we decided to rely on our own binary format.
.. _developerguide-io-strategy:
Binary Format
Our binary format does *not* contains information about the format of the data
itself, and it is hence necessary to know this format a priori. This means that
the format cannot be inferred from the content of a file.
We rely on the following fundamental C-style formats:
* int8
* int16
* int32
* int64
* uint8
* uint16
* uint32
* uint64
* float32
* float64
* complex64 (first real value, and then imaginary value)
* complex128 (first real value, and then imaginary value)
* bool (written as a byte)
* string
An element of such a basic format is written in the C-style way, using
little-endian byte ordering.
Besides, dataformats always consist of arrays or dictionary of such fundamental
formats or compound formats.
An array of elements is saved as followed. First, the shape of the array is
saved using an *uint64* value for each dimension. Next, the elements of the
arrays are saved in C-style order.
A dictionary of elements is saved as followed. First, the key are ordered
according to the lexicographic ordering. Then, the values associated to each of
these keys are saved following this ordering.
The platform is data-driven and always processes chunks of data. Therefore,
data are always written by chunks, each chunk being preceded by a text-formated
header indicated the start- and end- indices followed by the size (in bytes) of
the chunck.
Considering the Python backend of the platform, this binary format has been
successfully implemented using the ``struct`` module.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment