Commit 140f5284 authored by Hannah MUCKENHIRN's avatar Hannah MUCKENHIRN

Update README.md

parent 7bada4ca
This repository is based on the software Torch and allows to reproduce the results of the following paper:
---
@INPROCEEDINGS{Muckenhirn_ICASSP_2018,
author = {Muckenhirn, Hannah and Magimai.-Doss, Mathew and Marcel, S{\'{e}}bastien},
title = {Towards directly modeling raw speech signal for speaker verification using CNNs},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
year = {2018},
}
@INPROCEEDINGS{Muckenhirn_ICASSP_2018,
author = {Muckenhirn, Hannah and Magimai.-Doss, Mathew and Marcel, Sebastien},
title = {Towards directly modeling raw speech signal for speaker verification using CNNs},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
year = {2018},
}
---
If you use this code and/or its results, please cite the paper.
# Torch Installation
To install Torch, follow the instructions given [here](http://torch.ch/docs/getting-started.html). The experiments require one additional package: `sndfile` which you can install ith the following command: `luarocks install sndfile`.
To install Torch, follow the instructions given [here](http://torch.ch/docs/getting-started.html). The experiments require one additional package: `sndfile` which you can install with the following command:
```
luarocks install sndfile
```
# Database
......@@ -24,9 +26,9 @@ The data used is a subset (300 speakers) of the Voxforge database (English corpu
## I. First step: train a speaker identification system on the training dataset
### 1. Create .bin files
### 1. Create compressed files containing the data
The training set is split in two subsets: the training subset and the validation subset (90% and 10% of the data respectively). The convolutional neural network (CNN) is trained on the training subset while a validation error is computed on the validation subset in order to control that the CNN is not over-fitting.
The training set is split in two subsets: the training subset and the validation subset (90% and 10% of the data respectively). The convolutional neural network (CNN) is trained on the training subset while a validation error is computed on the validation subset to perform early stopping.
You first need to create six compressed files: three files for the training subset and three files for the validation subset. In both cases the three files contain respectively the audio data, the speakers labels and the voice activity detection (VAD) labels. The VAD labels are not mandatory but it was observed that it improves the performance of the system. The VAD labels were computed separately with the bob framework and are provided in the folder `files`.
......@@ -56,7 +58,7 @@ You can pass additional arguments when calling the function `trainSpeakerIdentif
## II. Second step: train one CNN for each speaker in the development and evaluation sets
### 1. Generate .bin files
### 1. Create compressed files containing the data
Both the development and evaluation sets are split into two subsets: the enrollment data and the probe data. One CNN is trained on the enrollment data of each speaker (where samples randomly chosen from the training set are used as the negative samples). Thus, the enrollment data is furthermore split into training/validation data: training data is used to train the CNN while validation data is used for early stopping.
......@@ -69,7 +71,7 @@ th generateFilesSpeakerVerification_enroll.lua -speakersTrain files/dev/dev_mode
th generateFilesSpeakerVerification_enroll.lua -speakersTrain files/eval/eval_model_train -speakersValid files/eval/eval_model_valid -negativeTrain files/small_world_for_verif/smallworld_train -negativeValid files/small_world_for_verif/smallworld_valid -speakersTrainVAD files/eval/eval_model_train_VAD -speakersValidVAD files/eval/eval_model_valid_VAD -negativeTrainVAD files/small_world_for_verif/smallworld_train_VAD -negativeValidVAD files/small_world_for_verif/smallworld_valid_VAD -folderData <folder_database> -output <folder_compressed_data_eval>
```
Note that in both cases the only arguments that you need to modify are: `-folderData` and `<output>`.
Note that in both cases the only arguments that you need to modify are: `-folderData` and `-output`.
To generate the files for the probe data of the development and evaluation sets:
```bash
......@@ -80,7 +82,7 @@ th generateFilesSpeakerVerification_probe.lua -probe files/dev/dev_probe -probeV
th generateFilesSpeakerVerification_probe.lua -probe files/eval/eval_probe -probeVAD files/eval/eval_probe_VAD -folderData <folder_database> -output <folder_compressed_data_eval>
```
Note that in both cases the only arguments that you need to modify are: `-folderData` and `<output>`.
Note that in both cases the only arguments that you need to modify are: `-folderData` and `-output`.
### 2. Train one CNN per speaker
......@@ -122,7 +124,7 @@ The equal error rate and half total error rate were computed with the [bob](http
# Models
If you want to use the pre-trained model (the one obtained in the first step "speaker identification on training set"), you can just load the model `model/speaker_identification_model.bin` in lua as the following:
If you want to use the pre-trained model (the one obtained in the first step "train a speaker identification system on the training dataset"), you can just load the model `model/speaker_identification_model.bin` in lua as the following:
```lua
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment