Skip to content
Snippets Groups Projects
Commit 439a7371 authored by Amir MOHAMMADI's avatar Amir MOHAMMADI
Browse files

Merge branch 'review_kaldi' into 'master'

Add documentation

See merge request !1
parents e011b90d 82d71e40
No related branches found
No related tags found
1 merge request!1Add documentation
Pipeline #
include COPYING README.rst buildout.cfg develop.cfg version.txt
include LICENSE README.rst buildout.cfg develop.cfg version.txt
recursive-include doc conf.py *.rst
recursive-include bob *.wav *.txt *.npy *.ivector *.ie
recursive-include bob/kaldi/test/data *.wav *.txt *.npy *.ivector *.ie
......@@ -2,27 +2,41 @@
.. Milos Cernak <milos.cernak@idiap.ch>
.. Tue Apr 4 15:28:26 CEST 2017
.. image:: http://img.shields.io/badge/docs-stable-yellow.svg
:target: http://pythonhosted.org/bob.kaldi/index.html
.. image:: http://img.shields.io/badge/docs-latest-orange.svg
:target: https://www.idiap.ch/software/bob/docs/latest/bob/bob.kaldi/master/index.html
.. image:: https://gitlab.idiap.ch/bob/bob.kaldi/badges/master/build.svg
:target: https://gitlab.idiap.ch/bob/bob.kaldi/commits/master
.. image:: https://img.shields.io/badge/gitlab-project-0000c0.svg
:target: https://gitlab.idiap.ch/bob/bob.kaldi
.. image:: http://img.shields.io/pypi/v/bob.kaldi.svg
:target: https://pypi.python.org/pypi/bob.kaldi
.. image:: http://img.shields.io/pypi/dm/bob.kaldi.svg
:target: https://pypi.python.org/pypi/bob.kaldi
===========================
Python Bindings for Kaldi
===========================
This package provides pythonic bindings for Kaldi_ functionality so it can be
seemlessly integrated with Python-based workflows.
seamlessly integrated with Python-based workflows. It is a part fo the signal-
processing and machine learning toolbox Bob_.
Installation
------------
To install this package -- alone or together with other `Packages of Bob
<https://github.com/idiap/bob/wiki/Packages>`_ -- please read the `Installation
Instructions <https://github.com/idiap/bob/wiki/Installation>`_. For Bob_ to
be able to work properly, some dependent packages are required to be installed.
Please make sure that you have read the `Dependencies
<https://github.com/idiap/bob/wiki/Dependencies>`_ for your operating system.
This package depends on both Bob_ and Kaldi_. To install Bob_ follow our
installation_ instructions. Kaldi_ is also bundled in our conda channnels which
means you can install Kaldi_ using conda easily too. After you have installed
Bob_, please follow these instructions to install Kaldi_ too.
This package also requires that Kaldi_ is properly installed alongside the
Python interpreter you're using, under the directory ``<PREFIX>/lib/kaldi``,
along with all necessary scripts and compiled binaries.
# BOB_ENVIRONMENT is the name of your conda enviroment.
$ source activate BOB_ENVIRONMENT
$ conda install kaldi
$ pip install bob.kaldi
Documentation
......@@ -31,9 +45,18 @@ Documentation
For further documentation on this package, please read the `Stable Version
<http://pythonhosted.org/bob.kaldi/index.html>`_ or the `Latest Version
<https://www.idiap.ch/software/bob/docs/latest/bioidiap/bob.kaldi/master/index.html>`_
of the documentation. For a list of tutorials on this or the other packages ob
of the documentation. For a list of tutorials on this or the other packages of
Bob_, or information on submitting issues, asking questions and starting
discussions, please visit its website.
Contact
-------
For questions or reporting issues to this software package, contact our
development `mailing list`_.
.. _bob: https://www.idiap.ch/software/bob
.. _kaldi: http://kaldi-asr.org/
.. _mailing list: https://www.idiap.ch/software/bob/discuss
.. _installation: https://www.idiap.ch/software/bob/install
......@@ -11,8 +11,6 @@ from .ivector import plda_train
from .ivector import plda_enroll
from .ivector import plda_score
from . import test
def get_config():
"""Returns a string containing the configuration information.
......
......@@ -6,38 +6,69 @@
import os
import numpy as np
from . import io
from subprocess import PIPE, Popen
from os.path import join
import tempfile
import shutil
import logging
logger = logging.getLogger("bob.kaldi")
logger = logging.getLogger(__name__)
def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
min_gaussian_weight=0.0001, num_gauss=2048, num_gauss_init=0,
num_gselect=30, num_iters_init=20, num_iters=4,
remove_low_count_gaussians=True):
""" Implements train_diag_ubm.sh
Parameters:
feats (numpy.ndarray): A 2D numpy ndarray object containing MFCCs.
Returns:
numpy.ndarray: The trained ubm model final.dubm.
"""Implements Kaldi egs/sre10/v1/train_diag_ubm.sh
Parameters
----------
feats : numpy.ndarray
A 2D numpy ndarray object containing MFCCs.
ubmname : str
A path to the UBM model.
num_threads : :obj:`int`, optional
Number of threads used for statistics accumulation.
num_frames : :obj:`int`, optional
Number of feature vectors to store in memory and train on
(randomly chosen from the input features).
min_gaussian_weight : :obj:`float`, optional
Kaldi MleDiagGmmOptions: Min Gaussian weight before we
remove it.
num_gauss : :obj:`int`, optional
Number of Gaussians in the model.
num_gauss_init : :obj:`int`, optional
Number of Gaussians in the model initially (if nonzero and
less than num_gauss, we'll do mixture splitting).
num_gselect : :obj:`int`, optional
Number of Gaussians to keep per frame.
num_iters_init : :obj:`int`, optional
Number of iterations of training for initialization of the
single diagonal GMM.
num_iters : :obj:`int`, optional
Number of iterations of training.
remove_low_count_gaussians : :obj:`bool`, optional
Kaldi MleDiagGmmOptions: If true, remove Gaussians that
fall below the floors.
Returns
-------
str
A path to the the trained ubm model.
"""
# 1. Initialize a single diagonal GMM
binary1 = 'gmm-global-init-from-feats'
cmd1 = [binary1]
binary2 = 'subsample-feats'
binary3 = 'gmm-gselect'
binary4 = 'gmm-global-acc-stats'
binary5 = 'gmm-global-est'
# 1. Initialize a single diagonal GMM
cmd1 = [binary1] # gmm-global-init-from-feats
cmd1 += [
'--num-threads=' + str(num_threads),
'--num-frames=' + str(num_frames),
......@@ -46,16 +77,13 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
'--num-gauss-init=' + str(num_gauss_init),
'--num-iters=' + str(num_iters_init),
]
with tempfile.NamedTemporaryFile(delete=False, suffix='.dubm') as initfile:
with tempfile.NamedTemporaryFile(delete=False, suffix='.dubm') as \
initfile, tempfile.NamedTemporaryFile(suffix='.log') as logfile:
cmd1 += [
'ark:-',
initfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe1 = Popen(cmd1, stdin=PIPE, stdout=PIPE, stderr=logfile)
# write ark file into pipe.stdin
io.write_mat(pipe1.stdin, feats, key=b'abc')
# pipe1.stdin.close()
......@@ -67,9 +95,9 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
# 2. Store Gaussian selection indices on disk-- this speeds up the
# training passes.
# subsample-feats --n=$subsample ark:- ark:- |"
binary = 'subsample-feats'
cmd = [binary]
with tempfile.NamedTemporaryFile(suffix='.ark') as arkfile:
cmd = [binary2] # subsample-feats
with tempfile.NamedTemporaryFile(suffix='.ark') as arkfile, \
tempfile.NamedTemporaryFile(delete=False, suffix='.gz') as gselfile:
cmd += [
'--n=5',
'ark:-',
......@@ -82,10 +110,7 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
with tempfile.NamedTemporaryFile(delete=False, suffix='.gz') as gselfile:
binary2 = 'gmm-gselect'
cmd2 = [binary2]
cmd2 = [binary3] # gmm-gselect
cmd2 += [
'--n=' + str(num_gselect),
initfile.name,
......@@ -103,21 +128,21 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
logger.debug("%s", logtxt)
inModel = initfile.name
for x in range(0, num_iters):
logger.info("Training pass " + str(x))
# Accumulate stats.
with tempfile.NamedTemporaryFile(suffix='.acc') as accfile:
binary3 = 'gmm-global-acc-stats'
cmd3 = [binary3]
cmd3 = [binary4] # gmm-global-acc-stats
cmd3 += [
'--gselect=ark,s,cs:gunzip -c ' + gselfile.name + '|',
inModel,
'ark:' + arkfile.name,
accfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe3 = Popen (cmd3, stdin=PIPE, stdout=PIPE, stderr=logfile)
with tempfile.NamedTemporaryFile(
suffix='.log') as logfile:
pipe3 = Popen(cmd3, stdin=PIPE,
stdout=PIPE, stderr=logfile)
# write ark file into pipe.stdin
# io.write_mat(pipe3.stdin, feats, key='abc')
# pipe3.stdin.close()
......@@ -130,10 +155,9 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
opt = '--remove-low-count-gaussians=false'
else:
opt = '--remove-low-count-gaussians=true'
binary4 = 'gmm-global-est'
cmd4 = [binary4]
with tempfile.NamedTemporaryFile(delete=False, suffix='.dump') as estfile:
cmd4 = [binary5] # gmm-global-est
with tempfile.NamedTemporaryFile(
delete=False, suffix='.dump') as estfile:
cmd4 += [
opt,
'--min-gaussian-weight=' + str(min_gaussian_weight),
......@@ -142,7 +166,8 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
estfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe4 = Popen (cmd4, stdin=PIPE, stdout=PIPE, stderr=logfile)
pipe4 = Popen(cmd4, stdin=PIPE,
stdout=PIPE, stderr=logfile)
pipe4.communicate()
with open(logfile.name) as fp:
logtxt = fp.read()
......@@ -158,37 +183,70 @@ def ubm_train(feats, ubmname, num_threads=4, num_frames=500000,
return ubmname + '.dubm'
def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4, min_gaussian_weight=1.0e-04):
""" Implements egs/sre10/v1/train_full_ubm.sh
def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4,
min_gaussian_weight=1.0e-04):
""" Implements Kaldi egs/sre10/v1/train_full_ubm.sh
Parameters
----------
feats : numpy.ndarray
A 2D numpy ndarray object containing MFCCs.
dubmname : str
A path to the UBM model.
num_gselect : :obj:`int`, optional
Number of Gaussians to keep per frame.
num_iters : :obj:`int`, optional
Number of iterations of training.
min_gaussian_weight : :obj:`float`, optional
Kaldi MleDiagGmmOptions: Min Gaussian weight before we
remove it.
Returns
-------
str
A path to the the trained full covariance UBM model.
"""
binary1 = 'gmm-global-to-fgmm'
binary2 = 'fgmm-global-to-gmm'
binary3 = 'subsample-feats'
binary4 = 'gmm-gselect'
binary5 = 'fgmm-global-acc-stats'
binary6 = 'fgmm-global-est'
origdubm = dubmname
dubmname += '.dubm'
# 1. Init (diagonal GMM to full-cov. GMM)
# gmm-global-to-fgmm $srcdir/final.dubm $dir/0.ubm || exit 1;
binary1 = 'gmm-global-to-fgmm'
cmd1 = [binary1]
cmd1 = [binary1] # gmm-global-to-fgmm
inModel = ''
with tempfile.NamedTemporaryFile(delete=False, suffix='.ubm') as initfile:
with tempfile.NamedTemporaryFile(delete=False, suffix='.ubm') as \
initfile, tempfile.NamedTemporaryFile(suffix='.log') as logfile:
inModel = initfile.name
cmd1 += [
dubmname,
inModel,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe1 = Popen(cmd1, stdin=PIPE, stdout=PIPE, stderr=logfile)
pipe1.communicate()
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
# 2. doing Gaussian selection (using diagonal form of model; selecting $num_gselect indices)
# gmm-gselect --n=$num_gselect "fgmm-global-to-gmm $dir/0.ubm - |" "$feats" \
# 2. doing Gaussian selection (using diagonal form of model; \
# selecting $num_gselect indices)
# gmm-gselect --n=$num_gselect "fgmm-global-to-gmm $dir/0.ubm - \
# |" "$feats" \
# "ark:|gzip -c >$dir/gselect.JOB.gz" || exit 1;
binary2 = 'fgmm-global-to-gmm'
cmd2 = [binary2]
with tempfile.NamedTemporaryFile(suffix='.dubm') as dubmfile:
cmd2 = [binary2] # fgmm-global-to-gmm
with tempfile.NamedTemporaryFile(suffix='.dubm') as dubmfile, \
tempfile.NamedTemporaryFile(suffix='.ark') as arkfile, \
tempfile.NamedTemporaryFile(suffix='.gz') as gselfile:
cmd2 += [
inModel,
dubmfile.name,
......@@ -199,11 +257,8 @@ def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4, min_gaussian_we
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
# subsample-feats --n=$subsample ark:- ark:- |"
binary = 'subsample-feats'
cmd = [binary]
with tempfile.NamedTemporaryFile(suffix='.ark') as arkfile:
cmd = [binary3] # subsample-feats
cmd += [
'--n=5',
'ark:-',
......@@ -216,10 +271,7 @@ def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4, min_gaussian_we
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
binary3 = 'gmm-gselect'
cmd3 = [binary3]
with tempfile.NamedTemporaryFile(suffix='.gz') as gselfile:
cmd3 = [binary4] # gmm-gselect
cmd3 += [
'--n=' + str(num_gselect),
dubmfile.name,
......@@ -227,59 +279,61 @@ def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4, min_gaussian_we
'ark:|gzip -c >' + gselfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe3 = Popen (cmd3, stdin=PIPE, stdout=PIPE, stderr=logfile)
pipe3 = Popen(cmd3, stdin=PIPE,
stdout=PIPE, stderr=logfile)
# io.write_mat(pipe3.stdin, feats, key='abc')
# pipe3.stdin.close()
pipe3.communicate()
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
# 3 est num_iters times
for x in range(0, num_iters):
logger.info("Training pass " + str(x))
# Accumulate stats.
with tempfile.NamedTemporaryFile(suffix='.acc') as accfile:
binary4 = 'fgmm-global-acc-stats'
cmd4 = [binary4]
with tempfile.NamedTemporaryFile(
suffix='.acc') as accfile:
cmd4 = [binary5] # fgmm-global-acc-stats
cmd4 += [
'--gselect=ark,s,cs:gunzip -c ' + gselfile.name + '|',
'--gselect=ark,s,cs:gunzip -c ' +
gselfile.name + '|',
inModel,
'ark:' + arkfile.name,
accfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe4 = Popen (cmd4, stdin=PIPE, stdout=PIPE, stderr=logfile)
with tempfile.NamedTemporaryFile(
suffix='.log') as logfile:
pipe4 = Popen(cmd4, stdin=PIPE,
stdout=PIPE, stderr=logfile)
# io.write_mat(pipe4.stdin, feats, key='abc')
pipe4.communicate()
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
# Don't remove low-count Gaussians till last iter.
if x < num_iters - 1:
opt = '--remove-low-count-gaussians=false'
else:
opt = '--remove-low-count-gaussians=true'
binary5 = 'fgmm-global-est'
cmd5 = [binary5]
with tempfile.NamedTemporaryFile(delete=False, suffix='.dump') as estfile:
cmd5 = [binary6] # fgmm-global-est
with tempfile.NamedTemporaryFile(
delete=False, suffix='.dump') as estfile:
cmd5 += [
opt,
'--min-gaussian-weight=' + str(min_gaussian_weight),
'--min-gaussian-weight=' +
str(min_gaussian_weight),
inModel,
accfile.name,
estfile.name,
]
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe5 = Popen (cmd5, stdin=PIPE, stdout=PIPE, stderr=logfile)
with tempfile.NamedTemporaryFile(
suffix='.log') as logfile:
pipe5 = Popen(cmd5, stdin=PIPE,
stdout=PIPE, stderr=logfile)
pipe5.communicate()
with open(logfile.name) as fp:
logtxt = fp.read()
logger.debug("%s", logtxt)
os.unlink(inModel)
inModel = estfile.name
......@@ -288,32 +342,37 @@ def ubm_full_train(feats, dubmname, num_gselect=20, num_iters=4, min_gaussian_we
return origdubm + '.fubm'
def ubm_enroll(feats, ubm_file):
"""Performes MAP adaptation of GMM-UBM model.
Parameters:
Parameters
----------
feats : numpy.ndarray
A 2D numpy ndarray object containing MFCCs.
ubm_file : str
A path to the Kaldi global GMM.
feats (numpy.ndarray): A 2D numpy ndarray object containing MFCCs.
ubm_file (string) : A Kaldi global GMM.
Returns
-------
str
A path to the enrolled GMM.
Returns:
"""
numpy.ndarray: The enrolled GMM.
binary1 = 'gmm-global-acc-stats'
binary2 = 'global-gmm-adapt-map'
"""
# 1. Accumulate stats for training a diagonal-covariance GMM.
binary1 = 'gmm-global-acc-stats'
cmd1 = [binary1]
cmd1 = [binary1] # gmm-global-acc-stats
cmd1 += [
ubm_file,
'ark:-',
'-',
]
binary2 = 'global-gmm-adapt-map'
cmd2 = [binary2]
with tempfile.NamedTemporaryFile(delete=False, suffix='.ubm') as estfile:
cmd2 = [binary2] # global-gmm-adapt-map
with tempfile.NamedTemporaryFile(delete=False, suffix='.ubm') as \
estfile, tempfile.NamedTemporaryFile(suffix='.log') as logfile:
cmd2 += [
'--update-flags=m',
ubm_file,
......@@ -321,10 +380,9 @@ def ubm_enroll(feats, ubm_file):
estfile.name,
]
# with open(os.devnull, "w") as fnull:
with tempfile.NamedTemporaryFile(suffix='.log') as logfile:
pipe1 = Popen(cmd1, stdin=PIPE, stdout=PIPE, stderr=logfile)
pipe2 = Popen (cmd2, stdin=pipe1.stdout, stdout=PIPE, stderr=logfile)
pipe2 = Popen(cmd2, stdin=pipe1.stdout,
stdout=PIPE, stderr=logfile)
# write ark file into pipe1.stdin
io.write_mat(pipe1.stdin, feats, key=b'abc')
pipe1.stdin.close()
......@@ -336,24 +394,29 @@ def ubm_enroll(feats, ubm_file):
return estfile.name
# gmm-global-acc-stats /idiap/temp/mcernak/mobio/mobio_kaldi_norm/Projector.hdf5 ark,t:feats.txt - | ../../Tools/kaldi/src/gmmbin/global-gmm-adapt-map --update-flags="m" /idiap/temp/mcernak/mobio/mobio_kaldi_norm/Projector.hdf5 - spk.ubm
def gmm_score(feats, gmm_file, ubm_file):
"""Print out per-frame log-likelihoods for input utterance.
Parameters:
feats (numpy.ndarray): A 2D numpy ndarray object containing MFCCs.
ubm_file (string) : A Kaldi global GMM.
Parameters
----------
feats : numpy.ndarray
A 2D numpy ndarray object containing MFCCs.
gmm_file (string) : A Kaldi adapted global GMM.
gmm_file : str
A path to Kaldi adapted global GMM.
ubm_file : str
A path to Kaldi global GMM.
Returns:
float: The average of per-frame log-likelihoods.
Returns
-------
float
The average of per-frame log-likelihoods.
"""
binary1 = 'gmm-global-get-frame-likes'
models = [
gmm_file,
ubm_file
......@@ -362,7 +425,6 @@ def gmm_score(feats, gmm_file, ubm_file):
# import ipdb; ipdb.set_trace()
for i, m in enumerate(models):
# 1. Accumulate stats for training a diagonal-covariance GMM.
binary1 = 'gmm-global-get-frame-likes'
cmd1 = [binary1]
cmd1 += [
'--average=true',
......
This diff is collapsed.
This diff is collapsed.
......@@ -8,33 +8,61 @@ import os
import numpy as np
from . import io
from subprocess import PIPE, Popen
# import subprocess
from os.path import join
import tempfile
from os.path import isfile
import logging
logger = logging.getLogger("bob.kaldi")
logger = logging.getLogger(__name__)
# from signal import signal, SIGPIPE, SIG_DFL
# signal(SIGPIPE,SIG_DFL)
def mfcc (data, rate=8000, preemphasis_coefficient=0.97, raw_energy=True, frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23, cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0, snip_edges=True, normalization=True):
def mfcc(data, rate=8000, preemphasis_coefficient=0.97, raw_energy=True,
frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23,
cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0,
snip_edges=True, normalization=True):
"""Computes the MFCCs for a given input signal
Parameters:
data (numpy.ndarray): A 1D numpy ndarray object containing 64-bit float
Parameters
----------
data : numpy.ndarray
A 1D numpy ndarray object containing 64-bit float
numbers with the audio signal to calculate the MFCCs from. The input
needs to be normalized between [-1, 1].
rate (float): The sampling rate of the input signal in ``data``.
Returns:
numpy.ndarray: The MFCCs calculated for the input signal (2D array of
rate : float
The sampling rate of the input signal in ``data``.
preemphasis_coefficient : :obj:`float`, optional
Coefficient for use in signal preemphasis
raw_energy : :obj:`bool`, optional
If true, compute energy before preemphasis and windowing
frame_length : :obj:`int`, optional
Frame length in milliseconds
frame_shift : :obj:`int`, optional
Frame shift in milliseconds
num_ceps : :obj:`int`, optional
Number of cepstra in MFCC computation (including C0)
num_mel_bins : :obj:`int`, optional
Number of triangular mel-frequency bins
cepstral_lifter : :obj:`int`, optional
Constant that controls scaling of MFCCs
low_freq : :obj:`int`, optional
Low cutoff frequency for mel bins
high_freq : :obj:`int`, optional
High cutoff frequency for mel bins (if < 0, offset from Nyquist)
dither : :obj:`float`, optional
Dithering constant (0.0 means no dither)
snip_edges : :obj:`bool`, optional
If true, end effects will be handled by outputting only frames
that completely fit in the file, and the number of frames
depends on the frame-length. If false, the number of frames
depends only on the frame-shift, and we reflect the data at
the ends.
normalization : :obj:`bool`, optional
If true, the input samples in ``data`` are normalized to [-1, 1].
Returns
-------
numpy.ndarray
The MFCCs calculated for the input signal (2D array of
32-bit floats).
"""
......@@ -88,27 +116,56 @@ def mfcc (data, rate=8000, preemphasis_coefficient=0.97, raw_energy=True, frame_
io.write_wav(pipe1.stdin, data, rate)
pipe1.stdin.close()
# # wait for piped execution to finish
# pipe3.communicate()
# read ark from pipe3.stdout
ret = [mat for name, mat in io.read_mat_ark(pipe3.stdout)][0]
return ret
def mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97, raw_energy=True, frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23, cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0, snip_edges=True):
def mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97,
raw_energy=True, frame_length=25, frame_shift=10,
num_ceps=13, num_mel_bins=23, cepstral_lifter=22,
low_freq=20, high_freq=0, dither=1.0, snip_edges=True):
"""Computes the MFCCs for a given input signal recorded into a file
Parameters:
filename (str): A path to a valid WAV or NIST Sphere file to read data from
channel (int): The audio channel to read from inside the file
Returns:
numpy.ndarray: The MFCCs calculated for the input signal (2D array of
Parameters
----------
filename : str
A path to a valid WAV or NIST Sphere file to read data from
channel : int
The audio channel to read from inside the file
preemphasis_coefficient : :obj:`float`, optional
Coefficient for use in signal preemphasis
raw_energy : :obj:`bool`, optional
If true, compute energy before preemphasis and windowing
frame_length : :obj:`int`, optional
Frame length in milliseconds
frame_shift : :obj:`int`, optional
Frame shift in milliseconds
num_ceps : :obj:`int`, optional
Number of cepstra in MFCC computation (including C0)
num_mel_bins : :obj:`int`, optional
Number of triangular mel-frequency bins
cepstral_lifter : :obj:`int`, optional
Constant that controls scaling of MFCCs
low_freq : :obj:`int`, optional
Low cutoff frequency for mel bins
high_freq : :obj:`int`, optional
High cutoff frequency for mel bins (if < 0, offset from Nyquist)
dither : :obj:`float`, optional
Dithering constant (0.0 means no dither)
snip_edges : :obj:`bool`, optional
If true, end effects will be handled by outputting only frames
that completely fit in the file, and the number of frames
depends on the frame-length. If false, the number of frames
depends only on the frame-shift, and we reflect the data at
the ends
Returns
-------
numpy.ndarray
The MFCCs calculated for the input signal (2D array of
32-bit floats).
"""
......@@ -148,6 +205,8 @@ def mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97, raw_energy
]
# import ipdb; ipdb.set_trace()
assert isfile(filename)
with open(os.devnull, "w") as fnull:
pipe1 = Popen(cmd1, stdin=PIPE, stdout=PIPE, stderr=fnull)
pipe2 = Popen(cmd2, stdout=PIPE, stdin=pipe1.stdout, stderr=fnull)
......@@ -163,12 +222,14 @@ def mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97, raw_energy
ret = [mat for name, mat in io.read_mat_ark(pipe3.stdout)][0]
return ret
# def compute_vad(feats, vad_energy_mean_scale=0.5, vad_energy_threshold=5, vad_frames_context=0, vad_proportion_threshold=0.6):
# def compute_vad(feats, vad_energy_mean_scale=0.5, vad_energy_threshold=5,
# vad_frames_context=0, vad_proportion_threshold=0.6):
# """Computes speech/non-speech segments given a Kaldi feature matrix
# Parameters:
# feats (matrix): A 2-D numpy array, with log-energy being in the first component of each feature vector
# feats (matrix): A 2-D numpy array, with log-energy being in the first
# component of each feature vector
# Returns:
......@@ -198,7 +259,8 @@ def mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97, raw_energy
# ]
# with tempfile.NamedTemporaryFile(suffix='.seg') as segfile:
# binary2 = utils.kaldi_path(['src', 'ivectorbin', 'create-split-from-vad'])
# binary2 = utils.kaldi_path(
# ['src', 'ivectorbin', 'create-split-from-vad'])
# cmd2 = [binary2]
# cmd2 += [
......
......@@ -8,15 +8,15 @@
import pkg_resources
import numpy as np
import bob.io.audio
import io
import bob.kaldi
def test_mfcc():
sample = pkg_resources.resource_filename(__name__, 'data/sample16k.wav')
reference = pkg_resources.resource_filename(__name__, 'data/sample16k-mfcc.txt')
reference = pkg_resources.resource_filename(
__name__, 'data/sample16k-mfcc.txt')
data = bob.io.audio.reader(sample)
......@@ -27,10 +27,12 @@ def test_mfcc():
assert np.allclose(ours, theirs, 1e-03, 1e-05)
def test_mfcc_from_path():
sample = pkg_resources.resource_filename(__name__, 'data/sample16k.wav')
reference = pkg_resources.resource_filename(__name__, 'data/sample16k-mfcc.txt')
reference = pkg_resources.resource_filename(
__name__, 'data/sample16k-mfcc.txt')
ours = bob.kaldi.mfcc_from_path(sample)
theirs = np.loadtxt(reference)
......@@ -55,7 +57,8 @@ def test_mfcc_from_path():
# segsref.append([ start, end ])
# segsref = np.array(segsref, dtype='int32')
# feats = [mat for name,mat in io.read_mat_ark( pkg_resources.resource_filename(__name__,'data/sample16k.ark') )][0]
# feats = [mat for name,mat in io.read_mat_ark(
# pkg_resources.resource_filename(__name__,'data/sample16k.ark') )][0]
# segs = bob.kaldi.compute_vad(feats)
......
......@@ -14,6 +14,7 @@ import os
import bob.kaldi
def test_ubm_train():
temp_file = bob.io.base.test_utils.temporary_filename()
......@@ -65,6 +66,7 @@ def test_ubm_enroll():
assert os.path.exists(spk_model)
def test_gmm_score():
temp_dubm_file = bob.io.base.test_utils.temporary_filename()
......
......@@ -14,6 +14,7 @@ import os
import bob.kaldi
def test_ivector_train():
temp_dubm_file = bob.io.base.test_utils.temporary_filename()
......@@ -29,16 +30,18 @@ def test_ivector_train():
ubm = bob.kaldi.ubm_full_train(array, temp_dubm_file,
num_gselect=2, num_iters=2)
# Train small ivector extractor
ivector = bob.kaldi.ivector_train(array, temp_dubm_file, num_gselect
= 2, ivector_dim = 20, num_iters = 2)
ivector = bob.kaldi.ivector_train(
array, temp_dubm_file, num_gselect=2, ivector_dim=20, num_iters=2)
assert os.path.exists(ivector)
def test_ivector_extract():
temp_dubm_file = bob.io.base.test_utils.temporary_filename()
sample = pkg_resources.resource_filename(__name__, 'data/sample16k.wav')
reference = pkg_resources.resource_filename(__name__, 'data/sample16k.ivector')
reference = pkg_resources.resource_filename(
__name__, 'data/sample16k.ivector')
data = bob.io.audio.reader(sample)
# MFCC
......@@ -50,9 +53,8 @@ def test_ivector_extract():
ubm = bob.kaldi.ubm_full_train(array, temp_dubm_file,
num_gselect=2, num_iters=2)
# Train small ivector extractor
ivector = bob.kaldi.ivector_train(array, temp_dubm_file, num_gselect
= 2, ivector_dim = 20, num_iters =
2)
ivector = bob.kaldi.ivector_train(
array, temp_dubm_file, num_gselect=2, ivector_dim=20, num_iters=2)
# Extract ivector
ivector_array = bob.kaldi.ivector_extract(array, temp_dubm_file,
num_gselect=2)
......@@ -65,7 +67,8 @@ def test_ivector_extract():
def test_plda_train():
temp_file = bob.io.base.test_utils.temporary_filename()
features = pkg_resources.resource_filename(__name__, 'data/feats-mobio.npy')
features = pkg_resources.resource_filename(
__name__, 'data/feats-mobio.npy')
feats = np.load(features)
......@@ -79,7 +82,8 @@ def test_plda_train():
def test_plda_enroll():
temp_file = bob.io.base.test_utils.temporary_filename()
features = pkg_resources.resource_filename(__name__, 'data/feats-mobio.npy')
features = pkg_resources.resource_filename(
__name__, 'data/feats-mobio.npy')
feats = np.load(features)
......@@ -92,8 +96,10 @@ def test_plda_enroll():
def test_plda_score():
temp_file = bob.io.base.test_utils.temporary_filename()
test_file = pkg_resources.resource_filename(__name__, 'data/test-mobio.ivector')
features = pkg_resources.resource_filename(__name__, 'data/feats-mobio.npy')
test_file = pkg_resources.resource_filename(
__name__, 'data/test-mobio.ivector')
features = pkg_resources.resource_filename(
__name__, 'data/feats-mobio.npy')
train_feats = np.load(features)
test_feats = np.loadtxt(test_file)
......@@ -106,6 +112,3 @@ def test_plda_score():
score = bob.kaldi.plda_score(test_feats, enrolled, temp_file)
assert np.allclose(score, [-23.9922], 1e-03, 1e-05)
.. py:currentmodule:: bob.kaldi
.. testsetup:: *
from __future__ import print_function
import pkg_resources
import bob.kaldi
import bob.io.audio
import tempfile
import os
=======================
Using Kaldi in Python
=======================
MFCC Extraction
---------------
Two functions are implemented to extract MFCC features
:py:func:`bob.kaldi.mfcc` and :py:func:`bob.kaldi.mfcc_from_path`. The former
function accepts the speech samples as :obj:`numpy.ndarray`, whereas the latter
the filename as :obj:`str`:
1. :py:func:`bob.kaldi.mfcc`
.. doctest::
>>> sample = pkg_resources.resource_filename('bob.kaldi', 'test/data/sample16k.wav')
>>> data = bob.io.audio.reader(sample)
>>> feat = bob.kaldi.mfcc(data.load()[0], data.rate, normalization=False)
>>> print (feat.shape)
(317, 39)
2. :py:func:`bob.kaldi.mfcc_from_path`
.. doctest::
>>> feat = bob.kaldi.mfcc_from_path(sample)
>>> print (feat.shape)
(317, 39)
UBM training and evaluation
---------------------------
Both diagonal and full covariance Universal Background Models (UBMs)
are supported, speakers can be enrolled and scored:
.. doctest::
>>> # Train small diagonall GMM
>>> projector = tempfile.NamedTemporaryFile()
>>> dubm = bob.kaldi.ubm_train(feat, projector.name, num_gauss=2, num_gselect=2, num_iters=2)
>>> # Train small full GMM
>>> ubm = bob.kaldi.ubm_full_train(feat, projector.name, num_gselect=2, num_iters=2)
>>> # Enrollement - MAP adaptation of the UBM-GMM
>>> spk_model = bob.kaldi.ubm_enroll(feat, dubm)
>>> # GMM scoring
>>> score = bob.kaldi.gmm_score(feat, spk_model, dubm)
>>> print ('%.3f' % score)
0.282
>>> os.unlink(projector.name + '.dubm')
>>> os.unlink(projector.name + '.fubm')
Following guide describes how to run whole speaker recognition experiments:
1. To run the UBM-GMM with MAP adaptation speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'gmm-kaldi' -s exp-gmm-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
2. To run the ivector+plda speaker recognition experiment, run:
.. code-block:: sh
verify.py -d 'mobio-audio-male' -p 'energy-2gauss' -e 'mfcc-kaldi' -a 'ivector-plda-kaldi' -s exp-ivector-plda-kaldi --groups {dev,eval} -R '/your/work/directory/' -T '/your/temp/directory' -vv
3. Results:
+---------------------------------------------------+--------+--------+
| Experiment description | EER | HTER |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 100GMM | 18.53% | 14.52% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'gmm-kadi', 512GMM | 17.51% | 12.44% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 64GMM | 12.26% | 11.97% |
+---------------------------------------------------+--------+--------+
| -e 'mfcc-kaldi', -a 'ivector-plda-kaldi', 256GMM | 11.35% | 11.46% |
+---------------------------------------------------+--------+--------+
.. include:: links.rst
......@@ -7,14 +7,14 @@
.. _bob.kaldi:
======================
Bob/Kaldi Extensions
======================
=======================
Bob wrapper for Kaldi
=======================
.. todolist::
This module contains information on how to build and maintain |project|
Kaldi_ extensions written in pure Python or a mix of C/C++ and Python.
This package provides a pythonic API for Kaldi_ functionality so it can be
seamlessly integrated with Python-based workflows.
Documentation
-------------
......@@ -22,6 +22,7 @@ Documentation
.. toctree::
:maxdepth: 2
guide
py_api
......@@ -32,4 +33,5 @@ Indices and tables
* :ref:`modindex`
* :ref:`search`
.. include:: links.rst
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment