Commit ab23ee34 authored by Pavel KORSHUNOV's avatar Pavel KORSHUNOV

Updated documentation

parent 235e4466
Pipeline #3912 canceled with stage
in 22 minutes and 31 seconds
......@@ -20,8 +20,14 @@
==========================
This package is part of the signal-processing and machine learning toolbox
Bob_. It contains basic audio processing utilities.
Bob_. It contains basic audio processing utilities. Currently, the following cepstral-based features are available:
using rectangular (RFCC), mel-scaled triangular (MFCC) [Davis1980]_, inverted mel-scaled triangular (IMFCC),
and linear triangular (LFCC) filters [Furui1981]_, spectral flux-based features (SSFC) [Scheirer1997]_,
subband centroid frequency (SCFC) [Le2011]_. We are planning to update and add more features in the
near future.
*Please note that the implementation of MFCC and LFCC features has changed compared to earlier version of the package,
as we corrected pre-emphasis and DCT computations. Delta and delta-delta computation was slightly changed too.*
Installation
------------
......@@ -39,8 +45,18 @@ Contact
For questions or reporting issues to this software package, contact our
development `mailing list`_.
.. [Davis1980] S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic
word recognition in continuously spoken sentences", in IEEE Transactions on Acoustics, Speech, and Signal Processing,
1980, num 4, vol. 28, pages 357-366.
.. [Furui1981] S. Furui, Cepstral analysis technique for automatic speaker verification, in
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, num 2 vol 29, pages 254-272.
.. [Scheirer1997] E. Scheirer and M. Slaney, Construction and evaluation of a robust multifeature speech/music discriminator,
in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 1997, vol 2, pages 1331-1334.
.. [Le2011] P. N. Le, E. Ambikairajah, J. Epps, V. Sethu, E. H. C. Choi, Investigation of Spectral Centroid Features for Cognitive Load Classification,
in Speech Commun., April, 2011, num 4, vol 53, pages 540--551.
.. Place your references here:
.. _bob: https://www.idiap.ch/software/bob
.. _installation: https://gitlab.idiap.ch/bob/bob/wikis/Installation
.. _mailing list: https://groups.google.com/forum/?fromgroups#!forum/bob-devel
.. vim: set fileencoding=utf-8 :
.. Andre Anjos <andre.anjos@idiap.ch>
.. Pavel Korshunov <pavel.korshunov>@idiap.ch
.. Mon 17 Feb 2014 16:22:21 CET
.. testsetup:: aptest
......@@ -23,18 +24,31 @@
User Guide
************
This section will give a deeper insight in some simple and some more complex
audio processing utilities of |project|. Currently, only cepstral extraction
module is available. We are planning to update and add more features in the
This section will give more insight in simple and more complex
audio processing utilities of |project|. Currently, the following cepstral-based features are available:
using rectangular (RFCC), mel-scaled triangular (MFCC) [Davis1980]_, inverted mel-scaled triangular (IMFCC),
and linear triangular (LFCC) filters [Furui1981]_, spectral flux-based features (SSFC) [Scheirer1997]_,
subband centroid frequency (SCFC) [Le2011]_. We are planning to update and add more features in the
near future.
.. [Davis1980] S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic
word recognition in continuously spoken sentences", in IEEE Transactions on Acoustics, Speech, and Signal Processing,
1980, num 4, vol. 28, pages 357-366.
.. [Furui1981] S. Furui, Cepstral analysis technique for automatic speaker verification, in
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, num 2 vol 29, pages 254-272.
.. [Scheirer1997] E. Scheirer and M. Slaney, Construction and evaluation of a robust multifeature speech/music discriminator,
in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 1997, vol 2, pages 1331-1334.
.. [Le2011] P. N. Le, E. Ambikairajah, J. Epps, V. Sethu, E. H. C. Choi, Investigation of Spectral Centroid Features for Cognitive Load Classification,
in Speech Commun., April, 2011, num 4, vol 53, pages 540--551.
Simple audio processing
=======================
Below are 3 examples on how to read a wavefile and how to compute Linear frequency Cepstral Coefficients (LFCC) and Mel frequency cepstrum coefficients (MFCC).
Other features can be computed in a similar fashion (please check Python API for details).
Reading audio files
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~
The usual native formats can be read with :py:mod:`scipy.io.wavfile` module. Other
wave formats can be found in some other python modules like :py:mod:`pysox`. An
......
......@@ -13,7 +13,8 @@
.. todolist::
This module contains base functionality from Bob bound to Python, available in
the C++ counter-part ``bob::ap``. It includes audio processing utilities.
the C++ counter-part ``bob::ap``. It includes audio processing utilities that can be used
for computation of the following audio features: MFCC, IMFCC, LFCC, RFCC, SCFC, SSFC, and SCMC.
Documentation
-------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment