Commit 1bcfbbaf authored by Amir MOHAMMADI's avatar Amir MOHAMMADI

Put the guide in a separate file

parent 44b98486
Pipeline #8779 canceled with stages
in 13 minutes and 40 seconds
.. py:currentmodule:: bob.kaldi
.. testsetup:: *
from __future__ import print_function
import pkg_resources
import bob.kaldi
import bob.io.audio
import tempfile
import os
=======================
Using Kaldi in Python
=======================
MFCC Extraction
---------------
Two functions are implemented to extract MFCC features
:py:func:`bob.kaldi.mfcc` and :py:func:`bob.kaldi.mfcc_from_path`. The former
function accepts the speech samples as :obj:`numpy.ndarray`, whereas the latter
the filename as :obj:`str`:
1. :py:func:`bob.kaldi.mfcc`
.. doctest::
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> data = bob.io.audio.reader(sample)
>>> feat = bob.kaldi.mfcc(data.load()[0], data.rate, normalization=False)
>>> print (feat.shape)
(317, 39)
2. :py:func:`bob.kaldi.mfcc_from_path`
.. doctest::
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> feat = bob.kaldi.mfcc_from_path(sample)
>>> print (feat.shape)
(317, 39)
UBM training and evaluation
---------------------------
Both diagonal and full covariance Universal Background Models (UBMs)
are supported, speakers can be enrolled and scored:
.. doctest::
>>> # Train small diagonall GMM
>>> projector = tempfile.NamedTemporaryFile()
>>> dubm = bob.kaldi.ubm_train(feat, projector.name, num_gauss=2, num_gselect=2, num_iters=2)
>>> # Train small full GMM
>>> ubm = bob.kaldi.ubm_full_train(feat, projector.name, num_gselect=2, num_iters=2)
>>> # Enrollement - MAP adaptation of the UBM-GMM
>>> spk_model = bob.kaldi.ubm_enroll(feat, dubm)
>>> # GMM scoring
>>> score = bob.kaldi.gmm_score(feat, spk_model, dubm)
>>> print ('%.3f' % score)
0.282
>>> os.unlink(projector.name)
>>> os.unlink(projector.name + '.dubm')
>>> os.unlink(projector.name + '.fubm')
Following guide describes how to run whole speaker recognition experiments:
1. To run the UBM-GMM with MAP adaptation speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'gmm-kaldi' -s exp-gmm-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
2. To run the ivector+plda speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'ivector-plda-kaldi' -s exp-ivector-plda-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
3. Results:
+---------------------------------------------------+--------+--------+
| Experiment description | EER | HTER |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 100GMM | 18.53% | 14.52% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 512GMM | 17.51% | 12.44% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 64GMM | 12.26% | 11.97% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 256GMM | 11.35% | 11.46% |
+---------------------------------------------------+--------+--------+
.. include:: links.rst
......@@ -4,28 +4,17 @@
..
.. Copyright (C) 2011-2014 Idiap Research Institute, Martigny, Switzerland
.. py:currentmodule:: bob.kaldi
.. testsetup:: *
from __future__ import print_function
import pkg_resources
import bob.kaldi
import bob.io.audio
import tempfile
import os
.. _bob.kaldi:
======================
Bob/Kaldi Extensions
======================
=======================
Bob wrapper for Kaldi
=======================
.. todolist::
This module contains information on how to build and maintain |project|
Kaldi_ extensions written in pure Python or a mix of C/C++ and Python.
This package provides a pythonic API for Kaldi_ functionality so it can be
seamlessly integrated with Python-based workflows.
Documentation
-------------
......@@ -33,6 +22,7 @@ Documentation
.. toctree::
:maxdepth: 2
guide
py_api
......@@ -44,86 +34,4 @@ Indices and tables
* :ref:`search`
MFCC Extraction
---------------
Two functions are implemented to extract MFCC features :py:func:`bob.kaldi.mfcc` and :py:func:`bob.kaldi.mfcc_from_path`. The former function accepts the speech samples as :obj:`numpy.ndarray`, whereas the latter the filename as :obj:`str`:
1. :py:func:`bob.kaldi.mfcc`
.. doctest::
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> data = bob.io.audio.reader(sample)
>>> feat = bob.kaldi.mfcc(data.load()[0], data.rate, normalization=False)
>>> print (feat.shape)
(317, 39)
2. :py:func:`bob.kaldi.mfcc_from_path`
.. doctest::
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> feat = bob.kaldi.mfcc_from_path(sample)
>>> print (feat.shape)
(317, 39)
====================
Speaker recognition
====================
UBM training and evaluation
---------------------------
Both diagonal and full covariance Universal Background Models (UBMs)
are supported, speakers can be enrolled and scored:
.. doctest::
>>> # Train small diagonall GMM
>>> projector = tempfile.NamedTemporaryFile()
>>> dubm = bob.kaldi.ubm_train(feat, projector.name, num_gauss=2, num_gselect=2, num_iters=2)
>>> # Train small full GMM
>>> ubm = bob.kaldi.ubm_full_train(feat, projector.name, num_gselect=2, num_iters=2)
>>> # Enrollement - MAP adaptation of the UBM-GMM
>>> spk_model = bob.kaldi.ubm_enroll(feat, dubm)
>>> # GMM scoring
>>> score = bob.kaldi.gmm_score(feat, spk_model, dubm)
>>> print ('%.3f' % score)
0.282
>>> os.unlink(projector.name)
>>> os.unlink(projector.name + '.dubm')
>>> os.unlink(projector.name + '.fubm')
Following guide describes how to run whole speaker recognition experiments:
1. To run the UBM-GMM with MAP adaptation speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'gmm-kaldi' -s exp-gmm-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
2. To run the ivector+plda speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'ivector-plda-kaldi' -s exp-ivector-plda-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
3. Results:
+---------------------------------------------------+--------+--------+
| Experiment description | EER | HTER |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 100GMM | 18.53% | 14.52% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 512GMM | 17.51% | 12.44% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 64GMM | 12.26% | 11.97% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 256GMM | 11.35% | 11.46% |
+---------------------------------------------------+--------+--------+
.. include:: links.rst
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment