Commit 26bde396 authored by Zohreh MOSTAANI's avatar Zohreh MOSTAANI
Browse files

[general][doc] removed input/output from general documentation

parent a4efc976
Pipeline #24581 passed with stages
in 4 minutes and 58 seconds
.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ ..
.. Contact: beat.support@idiap.ch ..
.. ..
.. This file is part of the beat.core module of the BEAT platform. ..
.. ..
.. Commercial License Usage ..
.. Licensees holding valid commercial BEAT licenses may use this file in ..
.. accordance with the terms contained in a written agreement between you ..
.. and Idiap. For further information contact tto@idiap.ch ..
.. ..
.. Alternatively, this file may be used under the terms of the GNU Affero ..
.. Public License version 3 as published by the Free Software and appearing ..
.. in the file LICENSE.AGPL included in the packaging of this file. ..
.. The BEAT platform is distributed in the hope that it will be useful, but ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE. ..
.. ..
.. You should have received a copy of the GNU Affero Public License along ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/. ..
.. _developerguide-io:
===============
Inputs/Outputs
===============
.. _developerguide-io-introduction:
Introduction
------------
The requirements for the platform when reading/writing data are:
* Ability to manage large and complex data
* Portability to allow the use of heterogeneous environments
Based on our experience and on these requirements, we investigated
the use of HDF5. Unfortunately, HDF5 is not convenient to handle
structures such as arrays of variable-size elements, for instance,
array of strings.
Therefore, we decided to rely on our own binary format.
.. _developerguide-io-strategy:
Binary Format
-------------
Our binary format does *not* contains information about the format of the data
itself, and it is hence necessary to know this format a priori. This means that
the format cannot be inferred from the content of a file.
We rely on the following fundamental C-style formats:
* int8
* int16
* int32
* int64
* uint8
* uint16
* uint32
* uint64
* float32
* float64
* complex64 (first real value, and then imaginary value)
* complex128 (first real value, and then imaginary value)
* bool (written as a byte)
* string
An element of such a basic format is written in the C-style way, using
little-endian byte ordering.
Besides, dataformats always consist of arrays or dictionary of such fundamental
formats or compound formats.
An array of elements is saved as followed. First, the shape of the array is
saved using an *uint64* value for each dimension. Next, the elements of the
arrays are saved in C-style order.
A dictionary of elements is saved as followed. First, the key are ordered
according to the lexicographic ordering. Then, the values associated to each of
these keys are saved following this ordering.
The platform is data-driven and always processes chunks of data. Therefore,
data are always written by chunks, each chunk being preceded by a text-formated
header indicated the start- and end- indices followed by the size (in bytes) of
the chunck.
Considering the Python backend of the platform, this binary format has been
successfully implemented using the ``struct`` module.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment