Commit ab81a752 authored by Theophile GENTILHOMME's avatar Theophile GENTILHOMME

replace doctest by code-block as unpredictable log outputs was printed and...

replace doctest by code-block as unpredictable log outputs was printed and +SKIP or +ELLIPSIS did not solve the problem
parent beb5a907
Pipeline #19524 passed with stage
in 36 minutes and 17 seconds
.. py:currentmodule:: bob.kaldi
.. testsetup:: *
from __future__ import print_function
import pkg_resources
import bob.kaldi
import bob.io.audio
import tempfile
import numpy
================================
Voice Activity Detection (VAD)
================================
......@@ -22,11 +12,11 @@ The function expects the speech samples as :obj:`numpy.ndarray` and the sampling
rate as :obj:`float`, and returns an array of VAD labels :obj:`numpy.ndarray`
with the labels of 0 (zero) or 1 (one) per speech frame:
.. doctest::
.. code-block:: python
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> data = bob.io.audio.reader(sample)
>>> VAD_labels = bob.kaldi.compute_vad(data.load()[0], data.rate) #doctest: +SKIP
>>> VAD_labels = bob.kaldi.compute_vad(data.load()[0], data.rate)
>>> print (len(VAD_labels))
317
......@@ -39,9 +29,9 @@ with headset microphone recordings is used for forward pass of mfcc
features. The VAD decision is computed by comparing the silence
posterior feature with the silence threshold.
.. doctest::
.. code-block:: python
>>> DNN_VAD_labels = bob.kaldi.compute_dnn_vad(data.load()[0], data.rate) #doctest: +SKIP
>>> DNN_VAD_labels = bob.kaldi.compute_dnn_vad(data.load()[0], data.rate)
>>> print (len(DNN_VAD_labels))
317
......@@ -59,17 +49,17 @@ the filename as :obj:`str`:
1. :py:func:`bob.kaldi.mfcc`
.. doctest::
.. code-block:: python
>>> feat = bob.kaldi.mfcc(data.load()[0], data.rate, normalization=False) #doctest: +SKIP
>>> feat = bob.kaldi.mfcc(data.load()[0], data.rate, normalization=False)
>>> print (feat.shape)
(317, 39)
2. :py:func:`bob.kaldi.mfcc_from_path`
.. doctest::
.. code-block:: python
>>> feat = bob.kaldi.mfcc_from_path(sample) #doctest: +SKIP
>>> feat = bob.kaldi.mfcc_from_path(sample)
>>> print (feat.shape)
(317, 39)
......@@ -79,18 +69,18 @@ UBM training and evaluation
Both diagonal and full covariance Universal Background Models (UBMs)
are supported, speakers can be enrolled and scored:
.. doctest::
.. code-block:: python
>>> # Train small diagonall GMM
>>> diag_gmm_file = tempfile.NamedTemporaryFile()
>>> full_gmm_file = tempfile.NamedTemporaryFile()
>>> dubm = bob.kaldi.ubm_train(feat, diag_gmm_file.name, num_gauss=2, num_gselect=2, num_iters=2) #doctest: +SKIP
>>> dubm = bob.kaldi.ubm_train(feat, diag_gmm_file.name, num_gauss=2, num_gselect=2, num_iters=2)
>>> # Train small full GMM
>>> ubm = bob.kaldi.ubm_full_train(feat, dubm, full_gmm_file.name, num_gselect=2, num_iters=2) #doctest: +SKIP
>>> ubm = bob.kaldi.ubm_full_train(feat, dubm, full_gmm_file.name, num_gselect=2, num_iters=2)
>>> # Enrollement - MAP adaptation of the UBM-GMM
>>> spk_model = bob.kaldi.ubm_enroll(feat, dubm) #doctest: +SKIP
>>> spk_model = bob.kaldi.ubm_enroll(feat, dubm)
>>> # GMM scoring
>>> score = bob.kaldi.gmm_score(feat, spk_model, dubm) #doctest: +SKIP
>>> score = bob.kaldi.gmm_score(feat, spk_model, dubm)
>>> print ('%.3f' % score)
0.282
......@@ -101,7 +91,7 @@ The implementation is based on Kaldi recipe SRE10_. It includes
ivector extrator training from full-diagonal GMMs, PLDA model
training, and PLDA scoring.
.. doctest::
.. code-block:: python
>>> plda_file = tempfile.NamedTemporaryFile()
>>> mean_file = tempfile.NamedTemporaryFile()
......@@ -111,11 +101,11 @@ training, and PLDA scoring.
>>> train_feats = numpy.load(features)
>>> test_feats = numpy.loadtxt(test_file)
>>> # Train PLDA model; plda[0] - PLDA model, plda[1] - global mean
>>> plda = bob.kaldi.plda_train(train_feats, plda_file.name, mean_file.name) #doctest: +SKIP
>>> plda = bob.kaldi.plda_train(train_feats, plda_file.name, mean_file.name)
>>> # Speaker enrollment (calculate average iVectors for the first speaker)
>>> enrolled = bob.kaldi.plda_enroll(train_feats[0], plda[1]) #doctest: +SKIP
>>> enrolled = bob.kaldi.plda_enroll(train_feats[0], plda[1])
>>> # Calculate PLDA score
>>> score = bob.kaldi.plda_score(test_feats, enrolled, plda[0], plda[1]) #doctest: +SKIP
>>> score = bob.kaldi.plda_score(test_feats, enrolled, plda[0], plda[1])
>>> print ('%.4f' % score)
-23.9922
......@@ -134,18 +124,18 @@ and noise, indexed 0, 1 and 2, respectively. These posteriors are thus
used for silence detection in :py:func:`bob.kaldi.compute_dnn_vad`,
but might be used also for the laughter and noise detection as well.
.. doctest::
.. code-block:: python
>>> nnetfile = pkg_resources.resource_filename('bob.kaldi', 'test/dnn/ami.nnet.txt')
>>> transfile = pkg_resources.resource_filename('bob.kaldi', 'test/dnn/ami.feature_transform.txt')
>>> feats = bob.kaldi.cepstral(data.load()[0], 'mfcc', data.rate, normalization=False) #doctest: +SKIP
>>> feats = bob.kaldi.cepstral(data.load()[0], 'mfcc', data.rate, normalization=False)
>>> nnetf = open(nnetfile)
>>> trnf = open(transfile)
>>> dnn = nnetf.read()
>>> trn = trnf.read()
>>> nnetf.close()
>>> trnf.close()
>>> ours = bob.kaldi.nnet_forward(feats, dnn, trn) #doctest: +SKIP
>>> ours = bob.kaldi.nnet_forward(feats, dnn, trn)
>>> print (ours.shape)
(317, 43)
......@@ -193,7 +183,7 @@ independent. The training of such model has following pipeline:
* Iterative alignment and update stage.
.. doctest::
.. code-block:: python
>>> fstfile = pkg_resources.resource_filename('bob.kaldi', 'test/hmm/L.fst')
>>> topofile = pkg_resources.resource_filename('bob.kaldi', 'test/hmm/topo.txt')
......@@ -206,7 +196,7 @@ independent. The training of such model has following pipeline:
>>> topof = open(topofile)
>>> topo = topof.read()
>>> topof.close()
>>> model = bob.kaldi.train_mono(train_set, labels, fstfile, topo, phfile , numgauss=2, num_iters=2) #doctest: +SKIP
>>> model = bob.kaldi.train_mono(train_set, labels, fstfile, topo, phfile , numgauss=2, num_iters=2)
>>> print (model.find('TransitionModel'))
1
......@@ -219,11 +209,11 @@ a forward pass with pre-trained phone DNN, and finds :math:`argmax()`
of the output posterior features. Looking at the DNN labels, the
phones are decoded per frame.
.. doctest::
.. code-block:: python
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/librivox.wav')
>>> data = bob.io.audio.reader(sample)
>>> post, labs = bob.kaldi.compute_dnn_phone(data.load()[0], data.rate) #doctest: +SKIP
>>> post, labs = bob.kaldi.compute_dnn_phone(data.load()[0], data.rate)
>>> mdecoding = numpy.argmax(post,axis=1) # max decoding
>>> print (labs[mdecoding[250]]) # the last spoken sound of sample is N (of the word DOMAIN)
N
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment