Snippets Groups Projects

2 years ago
878424e3

[doc] update computation time for voxceleb leaderboard · 878424e3
Flavio TARSETTI authored 2 years ago

878424e3

History

[doc] update computation time for voxceleb leaderboard
Flavio TARSETTI authored 2 years ago

voxceleb.rst 2.66 KiB

VoxCeleb Dataset

Dataset Description

VoxCeleb is a collection of voice recording of celebrities extracted from various Youtube videos. It contains:

		Identities	Sample count
train		1211	148642
dev / eval	references	40	4874
	probes		37720

GMM

[Min. criterion: EER ] Threshold on Development set: 1.062216e-01
	Development
Failure to Acquire	0.0%
False Match Rate	18.8% (3538/18860)
False Non Match Rate	18.8% (3538/18860)
False Accept Rate	18.8%
False Reject Rate	18.8%
Half Total Error Rate	18.8%

Command used:

$ bob bio pipeline -d voxceleb gmm-mobio -l sge-demanding -o results/gmm_voxceleb -n 512

On 128[1] CPU nodes on the SGE Grid: Ran in 10 hours.

ISV

TODO

Speechbrain ECAPA-TDNN

[Min. criterion: EER ] Threshold on Development set: -7.288057e-01
	Development
Failure to Acquire	0.0%
False Match Rate	1.0% (189/18860)
False Non Match Rate	1.0% (189/18860)
False Accept Rate	1.0%
False Reject Rate	1.0%
Half Total Error Rate	1.0%

Command used:

$ bob bio pipeline -d voxceleb -p speechbrain-ecapa-voxceleb -l sge-demanding -o results/speechbrain_voxceleb

On 128[1] CPU nodes on the SGE Grid: Ran in around 9 minutes (no training).

Note

ECAPA-TDNN gives a reference result of 0.8% EER on VoxCeleb. However, they were using a customized version of the dataset (VoxCeleb (cleaned)) which ignores 109 probe files (presumably containing wrong data) from our own dataset.

Footnotes

[1]	(1, 2) The number of nodes is a requested maximum amount and can vary depending on the number of jobs currently running on the grid as well as the scheduler's load estimation. The execution time can then also vary.