Commit 71df5dc0 authored by Pavel KORSHUNOV's avatar Pavel KORSHUNOV

Merge branch 'master' of gitlab.idiap.ch:master-biometrics/04-lab-speaker

parents adfb4892 54c6ce3b
Pipeline #24325 failed with stage
in 3 seconds
......@@ -11,10 +11,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### With VoxForge Dataset: \n",
"### On VoxForge Dataset: \n",
" - free, public dataset\n",
" - contains audio-recordings of 30 speakers (multiple recordings per speaker)\n",
" - short audio-recordings: ~3sec. long.\n",
" - short audio-recordings: ~3 sec. long.\n",
" - www.voxforge.org/downloads"
]
},
......@@ -24,7 +24,7 @@
"source": [
"## We will use the SPEAR toolkit from Bob -- bob.bio.spear\n",
"\n",
"- www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html"
"- Documentation: www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html"
]
},
{
......@@ -62,7 +62,19 @@
"\\end{split}\n",
"\\end{equation}\n",
"\n",
"### Here we use a GMM to represent the UBM, and separate GMMs for every enrolled client (speaker_model)."
"## Here we use a GMM to represent the UBM (Universal Background Model), and a separate GMM for every enrolled client (speaker-model)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Three Steps for Speaker-Recognition Experiment:\n",
"\n",
"\n",
"1. __Train__: train UBM-GMM using 'World' data\n",
"2. __Enroll__: create speaker-model for each speaker to be recognized\n",
"3. __Probe__: use a new, previously unseen (i.e., not used in step 1 or 2) speech-sample to try to recognize an enrolled speaker "
]
},
{
......@@ -83,11 +95,12 @@
"source": [
"## Enrollment Procedure\n",
"\n",
"Input: _x_: speech_sample, _u_: UBM, _i_: identity\n",
"\n",
"Input: x: speech-samples, U: UBM, i: Identity\n",
"\n",
"1. Compute F: array of MFCC features from _x_\n",
"2. Apply MAP adaptation on UBM_GMM to generate Speaker_GMM\n",
"3. Store Speaker_GMM model for Speaker _i_.\n",
"2. Apply MAP adaptation on U to generate Speaker-GMM\n",
"3. Store Speaker-GMM model for Speaker _i_.\n",
"\n",
"\n",
"![](figures/asv_enrollment.png)"
......@@ -99,15 +112,19 @@
"source": [
"## Probe Algorithm:\n",
"\n",
"Input: _x_: probe speech sample, _u_: UBM, _s_: Speaker-model of 'claimed identity'\n",
"Input: x: probe speech sample, U: UBM, $I_c$: Claimed identity\n",
"\n",
"1. Retrieve 's', the speaker-model for claimed identity $I_c$.\n",
"2. Compute F: array of MFCC features from x\n",
"3. Compute A = ln(p(F| s)): log(probability that speech-sample was produced by speaker-model of claimed-identity)\n",
"4. Compute B = ln(p(F| U)): log(probability that speech was produced by some other person in the world)\n",
"5. Score = (A - B)\n",
"6. If Score > $c_2$ (where $c_2$ is a pre-determined threshold) accept that speech-sample x comes from speaker $I_c$\n",
"\n",
"![](figures/asv_probing.png)\n",
"\n",
"1. Compute F: array of MFCC features from x\n",
"2. Compute A = ln(p(F|speaker_model)): log(probability that speech-sample was produced by speaker-model of claimed-identity)\n",
"3. Compute B = ln(p(F|UBM)): log(probability that speech was produced by some other person in the world)\n",
"4. Return score = (A - B)\n",
"5. If score > $c_2$, accept that _x_ comes from speaker _s_\n",
"\n",
"![](figures/asv_probing.png)\n"
"### Threshold $c_2$ may be selected using the 'Dev' set."
]
},
{
......@@ -148,7 +165,7 @@
"import speaker_lib as sl\n",
"#from speaker_lib import load_scores\n",
"\n",
"dev_score_file = \"data/voxforge_denoised_16K_scores_dev.txt\"\n",
"dev_score_file = \"data/scores_vf_denoised_16k_dev.txt\"\n",
"#print(\"My file is %s\" %my_file_name)\n",
"#my_file = Path(my_file_name)\n",
"#assert my_file.is_file(), \"File %s does not exist. Quitting.\" %my_file_name\n",
......@@ -203,7 +220,7 @@
" - Genuine presentation classified as Impostor: False non-match rate (FNMR)\n",
" - Impostor presentation classified as Genuine: False Match Rate (FMR)\n",
"\n",
"DET curve shows how much FNMR you should expect as you adjust the Threshold (left to right) to achieve a desired FMR."
"DET curve shows how much FNMR you should expect as you adjust the Threshold (low to high) to achieve a target FMR."
]
},
{
......
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expt. 1: Speaker verification using clean, high-quality (16khz sampling) data"
]
},
{
"cell_type": "code",
"execution_count": null,
......@@ -28,24 +35,6 @@
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"code_dir = os.getcwd()\n",
"print(code_dir)\n",
"assert os.path.isdir(\"data\"), \"PANIC! 'data' folder does not exist!\"\n",
"os.chdir(\"data\")\n",
"\n",
"print(os.getcwd())\n",
"os.chdir(code_dir)\n",
"print(os.getcwd())"
]
},
{
"cell_type": "code",
"execution_count": null,
......@@ -99,7 +88,7 @@
"\n",
"from speaker_lib import load_scores\n",
"\n",
"my_file_name = \"data/voxforge_denoised_16K_scores_dev.txt\"\n",
"my_file_name = \"data/scores_vf_denoised_16k_dev.txt\"\n",
"# The file is expected to be in the 4-column format devised for bob-score-files.\n",
" \n",
"dev16K_zei_scores, dev16K_gen_scores = load_scores(my_file_name)\n",
......@@ -121,9 +110,11 @@
"\n",
"### We have 10 enrolled clients\n",
" - For every client we have 30 'genuine' presentations, that is, 30 feature-files per client\n",
" - For any one client, we can use the presentations of the other clients as 'zero-effort-impostor' (ZEI) presentations\n",
" - So, for every client we have 270 (9 $\\times$ 30) ZEI presentations\n",
" - In total, we will have 300 genuine scores and 2,700 ZEI scores"
" - For a given client, we can use the presentations of the other clients as 'zero-effort-impostor' (ZEI) presentations\n",
" - Thus, for every client we have 270 (9 $\\times$ 30) ZEI presentations\n",
" - In total, we will have 300 genuine scores and 2,700 ZEI scores\n",
" \n",
"### Ideally, we want that the genuine presentations of a given client should be accepted (score higher than chosen threshold), and the ZEI presentations for that client should be rejected (scores lower than threshold)."
]
},
{
......@@ -275,7 +266,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## What happens if we record probe samples with 8K sampling rate (phone-quality) ?\n",
"## Expt. 2: What happens if we probe with low-quality (8khz sampling, phone-quality) data?\n",
"\n",
"### Compare DET curves of recognition experiment with '16K' and '8K' data ...\n",
"\n",
......@@ -289,7 +280,7 @@
"metadata": {},
"outputs": [],
"source": [
"score_file_list = ['voxforge_denoised_16K_scores_dev.txt','scores_vf_denoised_8k_dev.txt']\n",
"score_file_list = ['scores_vf_denoised_16k_dev.txt','scores_vf_denoised_8k_dev.txt']\n",
"labels = ['16K', '8K']\n",
"plots.plot_multidet(score_file_list, labels, base_path=\"./data\")"
]
......@@ -310,18 +301,13 @@
"plots.plot_multiroc(score_file_list, labels, base_path=\"./data\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## add epc"
"## Expt. 3: Using EPC (Expected Performance Curves) to compare different systems\n",
"\n",
"### The lower the curve, the better the system, for a given $\\alpha$"
]
},
{
......@@ -330,18 +316,14 @@
"metadata": {},
"outputs": [],
"source": [
"dev_file_list = ['voxforge_denoised_16K_scores_dev.txt','scores_vf_denoised_8k_dev.txt']\n",
"eval_file_list = ['voxforge_denoised_16K_scores_dev.txt','scores_vf_denoised_8k_eval.txt']\n",
"expt_labels = ['16K', '8K']\n",
"plots.plot_multiepc(dev_file_list, eval_file_list, expt_labels, base_path=\"./data\")"
"from plots import plot_multiepc\n",
"\n",
"system_score_files = {}\n",
"system_score_files['16K'] = ['scores_vf_denoised_16k_dev.txt', 'scores_vf_denoised_16k_dev.txt'] #2 elements: 1st- dev_score_file; 2nd- eval_score_file\n",
"system_score_files['8K'] = ['scores_vf_denoised_8k_dev.txt', 'scores_vf_denoised_8k_eval.txt'] \n",
"\n",
"plot_multiepc(system_score_files, base_path=\"./data\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
......
......@@ -249,7 +249,7 @@ def plot_multidet(file_names, labels, base_path="./data"):
pyplot.show()
def plot_multiepc(dev_filenames, eval_filenames, labels, base_path="./data"):
def plot_multiepc(score_files_dict, base_path="./data"):
"""
Plot DET curves from several systems
......@@ -265,18 +265,15 @@ def plot_multiepc(dev_filenames, eval_filenames, labels, base_path="./data"):
"""
from speaker_lib import load_scores
# make sure 'dev_filenames' and 'eval_filenames' have same no. of items
assert (len(dev_filenames) == len(eval_filenames)), "File-name sets mismatch! Quitting."
assert (len(labels) == len(dev_filenames)), "Label-set does not match file-name set! Quitting."
score_dict={}
system_labels = list(score_files_dict.keys())
# EPC curve
pyplot.figure(figsize=(16,8))
# score-var names: 'd'for dev, 'e' for eval; 'z' for zei, 'g' for genuine.
for i, l in enumerate(labels):
d_z, d_g = load_scores(os.path.join(base_path, dev_filenames[i]))
e_z, e_g = load_scores(os.path.join(base_path, eval_filenames[i]))
for i, l in enumerate(system_labels):
d_z, d_g = load_scores(os.path.join(base_path, score_files_dict[l][0]))
e_z, e_g = load_scores(os.path.join(base_path, score_files_dict[l][1]))
#score_dict[label[i]] = [d_z, d_g, e_z, e_g]
bob.measure.plot.epc(d_z, d_g, e_z, e_g, npoints=100, linestyle='-', label=l)
......@@ -285,8 +282,9 @@ def plot_multiepc(dev_filenames, eval_filenames, labels, base_path="./data"):
#bob.measure.plot.det_axis([0.01, 99, 0.01, 99])
pyplot.grid(True)
pyplot.title("(EPC)")
pyplot.xlabel("alfa")
pyplot.ylabel(" ")
pyplot.xlabel(r'$\alpha$' )
pyplot.ylabel('HTER (%)' )
pyplot.ylim(-1, 101)
pyplot.legend()
pyplot.show()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment