Commit f0f1c01d authored by André Anjos's avatar André Anjos 💬

Adds basic user guide

parent 98ff69be
.. vim: set fileencoding=utf-8 :
.. Andre Anjos <>
.. Mon 17 Feb 2014 16:22:21 CET
.. testsetup:: aptest
import bob
import numpy
import math
import os
def F(m, f):
from pkg_resources import resource_filename
return resource_filename('bob.%s.test' % m, os.path.join('data', f))
wave_path = F('ap', 'sample.wav')
import sys
sys.stdout = open(os.devnull, 'w')
rate, signal =
sys.stdout = sys.__stdout__
User Guide
This section will give a deeper insight in some simple and some more complex
audio processing utilities of |project|. Currently, only cepstral extraction
module is available. We are planning to update and add more features in the
near future.
Simple audio processing
Below are 3 examples on how to read a wavefile and how to compute Linear frequency Cepstral Coefficients (LFCC) and Mel frequency cepstrum coefficients (MFCC).
Reading audio files
The usual native formats can be read with **** module. Other
wave formats can be found in some other python modules like **pysox**. An
example of wave file can be found here **bob/ap/test/data/sample.wav**
.. doctest:: aptest
>>> import #doctest: +SKIP
>>> rate, signal = #doctest: +SKIP
>>> print(rate)
>>> print(signal)
[ 28 72 58 ..., -301 89 230]
In the above example, the sampling rate of the audio signal is **8 KHz** and
the signal array is of type **int16**.
User can directly compute the duration of signal (in seconds):
.. doctest:: aptest
>>> print(int(len(signal)/rate))
LFCC and MFCC Extraction
The LFCC and MFCC coefficients can be extracted from a audio signal by using
:py:func:`bob.ap.Ceps`. To do so, several parameters can be precised by the
user. Typically, these are precised in a configuration file. The following
values are the default ones:
.. doctest:: aptest
>>> win_length_ms = 20 # The window length of the cepstral analysis in milliseconds
>>> win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
>>> n_filters = 24 # The number of filter bands
>>> n_ceps = 19 # The number of cepstral coefficients
>>> f_min = 0. # The minimal frequency of the filter bank
>>> f_max = 4000. # The maximal frequency of the filter bank
>>> delta_win = 2 # The integer delta value used for computing the first and second order derivatives
>>> pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
>>> dct_norm = True # A factor by which the cepstral coefficients are multiplied
>>> mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
Once the parameters are precised, :py:func:`bob.ap.Ceps` can be called as
.. doctest:: aptest
>>> c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min, f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
>>> signal = numpy.cast['float'](signal) # vector should be in **float**
>>> mfcc = c(signal)
>>> print(len(mfcc))
>>> print(len(mfcc[0]))
LFCCs can be computed instead of MFCCs by setting ``mel_scale`` to ``False``:
.. doctest:: aptest
>>> c.mel_scale = False
>>> lfcc = c(signal)
User can also choose to extract the energy. This is typically used for Voice
Activity Detection (VAD). Please check ``spkRecLib`` or ``FaceRecLib`` for more
details about VAD.
.. doctest:: aptest
>>> c.with_energy = True
>>> lfcc_e = c(signal)
>>> print(len(lfcc_e))
>>> print(len(lfcc_e[0]))
It is also possible to compute first and second derivatives for those features:
.. doctest:: aptest
>>> c.with_delta = True
>>> c.with_delta_delta = True
>>> lfcc_e_d_dd = c(signal)
>>> print(len(lfcc_e_d_dd))
>>> print(len(lfcc_e_d_dd[0]))
......@@ -17,6 +17,7 @@ Reference
.. toctree::
:maxdepth: 2
Indices and tables
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment