Commit 5e56ef57 authored by Manuel Günther's avatar Manuel Günther

Moved download_and_untar script from bash to python, fixing #1 and #2

parent b160c092
include README.rst bootstrap-buildout.py buildout.cfg COPYING version.txt requirements.txt
recursive-include doc *.py *.rst
recursive-include bob *.lst
recursive-include scripts *.sh
......@@ -37,9 +37,13 @@ Please make sure that you have read the `Dependencies <https://github.com/idiap/
Getting the data
----------------
The data can be downloaded from its original URL (on Voxforge), or by running ``./scripts/download_and_untar.sh``, which takes as input the path in which the data will be stored::
The data can be downloaded from its original URL (on Voxforge), or by running ``./bin/download_and_untar_voxforge.py``, which takes as input the path in which the data will be stored (using ``VOXFORGE_DATABSE`` as default)::
$ ./scripts/download_and_untar.sh PATH/TO/WAV/DIRECTORY
$ ./bin/download_and_untar_voxforge.py PATH/TO/WAV/DIRECTORY
.. note::
Running this script requires this package to be installed.
If you are using an installation strategy (such as ``pip``), the directory, where the script is placed, might differ.
Documentation
......@@ -50,5 +54,3 @@ For a list of tutorials on this or the other packages ob Bob_, or information on
.. _bob: https://www.idiap.ch/software/bob
.. _voxforge: http://www.voxforge.org
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# @author: Manuel Gunther <siebenkopf@googlemail.com>
# @date: Tue Dec 29 09:23:53 MST 2015
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
def main():
import pkg_resources
import sys
import os
if sys.version_info[0] <= 2:
import urllib2 as urllib
else:
import urllib.request as urllib
import tarfile
directory = sys.argv[1] if len(sys.argv) > 1 else "VOXFORGE_DATABASE"
if not os.path.exists(directory):
print ("Creating intermediate directory '%s', where downloaded archives will be placed" % directory)
os.makedirs(directory)
baselink = "http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit"
listfile = pkg_resources.resource_filename("bob.db.voxforge", "lists/list_of_tgz_files.lst")
with open(listfile) as lf:
for line in lf:
line = line.strip()
basename = os.path.splitext(line)[0]
outfile = os.path.join(directory, basename)
if os.path.exists(outfile):
print ("Skipping existing entry '%s'" % outfile)
continue
url = baselink + "/" + line
tempfile = os.path.join(directory, line)
try:
print ("Downloading file '%s' to '%s'" % (url, tempfile))
url = urllib.urlopen(url)
dfile = open(tempfile, 'wb')
dfile.write(url.read())
dfile.close()
print ("Extracting file '%s' to '%s'" % (tempfile, outfile))
tar = tarfile.open(tempfile, 'r')
tar.extractall(directory)
tar.close()
except Exception as e:
print ("ERROR: Downloading and unpacking of '%s' was not successful: %s" % (tempfile, e))
# TODO: should we just try to re-download, or leave it to the user to call this script again?
finally:
# TODO: should we leave possibly broken files here, so that it can be inspected later?
os.remove(tempfile)
......@@ -246,7 +246,7 @@ autodoc_default_flags = ['members', 'undoc-members', 'inherited-members', 'show-
# For inter-documentation mapping:
from bob.extension.utils import link_documentation
intersphinx_mapping = link_documentation(['python', 'bob.db.base', 'bob.db.verification.utils', 'bob.db.verification.filelist', 'bob.spear'])
intersphinx_mapping = link_documentation(['python', 'bob.db.base', 'bob.db.verification.utils', 'bob.db.verification.filelist', 'bob.bio.spear'])
def setup(app):
......
......@@ -18,9 +18,9 @@
In this package, we design a speaker recognition protocol that uses a **small subset of the english audio files** (only 6561 files) belonging to **30 speakers** randomly selected.
This subset is split into three equivalent parts: Training (10 speakers), Development (10 speakers) and Test (10 speakers) sets.
This package serves as a toy example of speaker recognition database while testing :ref:`bob.spear <bob.spear>`.
This package serves as a toy example of speaker recognition database while testing :ref:`bob.bio.spear <bob.bio.spear>`.
:ref:`bob.spear <bob.spear>` is developed at Idiap_ during its participation to the `NIST SRE 2012 evaluation`_.
:ref:`bob.bio.spear <bob.bio.spear>` is developed at Idiap_ during its participation to the `NIST SRE 2012 evaluation`_.
If you use this package and/or its results, please cite the following publications:
1. The original paper presented at the NIST SRE 2012 workshop:
......@@ -55,11 +55,15 @@ If you use this package and/or its results, please cite the following publicatio
Getting the data
----------------
The original data can be downloaded directly from Voxforge_, or by running ``./scripts/download_and_untar.sh`` that takes as input the path in which the data will be stored:
The original data can be downloaded directly from Voxforge_, or by running ``./bin/download_and_untar_voxforge.py`` which takes as input the path in which the data will be stored (using ``VOXFORGE_DATABSE`` as default)::
.. code-block:: sh
$ ./scripts/download_and_untar.sh PATH/TO/WAV/DIRECTORY
$ ./bin/download_and_untar_voxforge.py PATH/TO/WAV/DIRECTORY
.. note::
Running this script requires this package to be installed.
If you are using an installation strategy (such as ``pip``), the directory, where the script is placed, might differ.
Documentation
......@@ -82,5 +86,3 @@ Indices and tables
.. _voxforge: http://www.voxforge.org
.. _nist sre 2012 evaluation: http://www.nist.gov/itl/iad/mig/sre12.cfm
.. _idiap: http://www.idiap.ch
# Elie Khoury <Elie.Khoury@idiap.ch>
# Date: Thu Aug 22 18:17:29 CEST 2013
#
# Copyright (C) 2012-2013 Idiap Research Institute, Martigny, Switzerland
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# This script will download the audio files used in the protocol.
# It will first download the tgz files, and then decompress them.
if [ "$#" -ne 1 ]; then
echo "Usage: $0 DATA_DIR"
exit 1
fi
baselink="http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit"
directory=$1
mkdir -p $directory
while read filename; do
basefilename=`basename $filename .tgz`
echo $basefilename
if [ ! -d "$directory/$basefilename" ]; then
wget $baselink/$filename
tar -zxvf $filename
mv $basefilename $directory/.
rm $filename
fi
done < bob/db/voxforge/lists/list_of_tgz_files.lst # where the list of files is stored
......@@ -47,6 +47,11 @@ setup(
entry_points = {
# scripts to download the database
'console_scripts' : [
'download_and_untar_voxforge.py = bob.db.voxforge.download_and_untar:main'
],
# declare database to bob
'bob.db': [
'voxforge = bob.db.voxforge.driver:Interface',
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment