Skip to content
Snippets Groups Projects
Commit e8c47eab authored by André Anjos's avatar André Anjos :speech_balloon:
Browse files

[many] Fix dos->unix end-of-lines

parent a65508d6
No related branches found
No related tags found
No related merge requests found
========================================================================
Biometric recognition on the CPqD Biometric Database (BioCPqD Phase 1)
========================================================================
1. All publications that report on research that use the Corpus on the BEAT
platform will acknowledge the BioCPqD database and the BEAT platform by
referring to the following publication::
@article{Violato_CPqD_2013,
author = {R. P. V. Violato and M. Uliani Neto and F. O. Simoes and T. F. Pereira and M. A. Angeloni},
title = {BioCPqD: uma base de dados biometricos com amostras de face e voz de individuos brasileiros},
year = {2013},
journal = {Cadernos CPqD Tecnologia, Campinas, Brazil},
volume = {9},
pages = {7--18},
publisher = {CPqD},
url = {http://www.cpqd.com.br/cadernosdetecnologia/Vol9_N2_jul_dez_2013/artigo1.html},
}
2. Bob as the core framework used to run the experiments::
@inproceedings{Anjos_ACMMM_2012,
author = {A. Anjos and L. El Shafey and R. Wallace and M. G\"unther and C. McCool and S. Marcel},
title = {Bob: a free signal processing and machine learning toolbox for researchers},
year = {2012},
month = oct,
booktitle = {20th ACM Conference on Multimedia Systems (ACMMM), Nara, Japan},
publisher = {ACM Press},
url = {http://publications.idiap.ch/downloads/papers/2012/Anjos_Bob_ACMMM12.pdf},
}
Overview
--------
This database was designed to provide data that was recorded in a natural way,
using various devices in different environments. Hence, algorithms that
perform well on this database are expected to be suitable for other real-world
applications that do not require a predefined audio/video recording setup.
Database participants were selected among employees of CPqD Foundation who
volunteered to make recordings. A unique ID was assigned for each participant,
composed by a prefix (M for male and F for female) followed by a 4-digit number
(odd for males and even for females). Each participant recorded up to five
sessions, with a time lapse of at least 10 days between sessions.
Sessions consisted of 27 recorded sentences, whose content was specified in a
script. Each sentence was recorded on three different devices types:
* Laptops (audio and video content);
* Smartphones (audio and video content);
* Phone calls (only audio).
For each device type, a set of devices was used, as specified below:
* Laptops:
- Compaq 510 with embedded mic and camera;
- Toshiba with USB Logitech QuickCam Pro 9000 webcam;
- DELL Latitude embedded mic and camera.
* Smartphones:
- Samsung Galaxy S II;
- Apple iPhone 4;
- Apple iPhone 4.
* Phone calls:
- landline phone call;
- personal mobile phone call.
Recordings were made in three environments with different characteristics:
garden, restaurant (public indoor) and office. The idea behind this strategy
was to exploit the influence of environmental noise in audio recordings and the
effect of illumination and background conditions in the video recordings.
Since the database includes recordings captured on different devices of
different types and in different environments, it allows a large number of
experimental setups.
Content
-------
The data collection followed a simple recording protocol that was replicated
for all sessions. For each session there was a corresponding script describing
the whole content to be recorded, as follows:
Text reading:
* a pre-defined text (extracted from the database's consent form);
* four phonetically rich sentences (randomly selected among 562 options);
* passphrase: three repetitions a single sentence (the same sentence for all
participants in all sessions).
Spontaneous speech:
* answers for generic questions (all participants answered all 15 questions
selected form a fixed set, distributed along the 5 sessions in random
order);
* a fake name;
* a fake address;
* a fake birthday date;
* a fake ID number;
* a fake phone number;
* two command words (all participants spoke 10 words along the 5 sessions in
random order).
Numbers, digits, time values and alphanumeric strings:
* a monetary amount between 10 and 10 000, randomly generated;
* a number between 10 and 1000, randomly generated;
* a number between 1000 and 10 million, randomly generated;
* three repetitions of a random digit sequence (first one read in a slow pace
and others naturally read);
* a fake credit card number;
* an alphanumeric string composed of 6 characters, randomly generated;
* a time value, selected among a predefined set with 181 samples, equally
distributed among participants.
It is important to note that all content was recorded in Brazilian Portuguese
language.
BioCPqD Phase I database provides unbiased biometric verification protocols,
one for male and one for female participants, based on the MOBIO database
protocols. These protocols partition the database in three different groups: -
a Training set: used to train the parameters of algorithm to be tested, e.g.,
to create the projection matrix, Universal Background Models, etc.; - a
Development set: used to evaluate hyper-parameters of the tested algorithms; -
a Test set: used to evaluate the generalization performance of the tested
algorithms with previously unseen data.
Both development and test sets are further split into an enrollment subset
(used to enroll participants' models), and a probe set (whose files will be
tested against all participants' models).
========================================================================
Biometric recognition on the CPqD Biometric Database (BioCPqD Phase 1)
========================================================================
1. All publications that report on research that use the Corpus on the BEAT
platform will acknowledge the BioCPqD database and the BEAT platform by
referring to the following publication::
@article{Violato_CPqD_2013,
author = {R. P. V. Violato and M. Uliani Neto and F. O. Simoes and T. F. Pereira and M. A. Angeloni},
title = {BioCPqD: uma base de dados biometricos com amostras de face e voz de individuos brasileiros},
year = {2013},
journal = {Cadernos CPqD Tecnologia, Campinas, Brazil},
volume = {9},
pages = {7--18},
publisher = {CPqD},
url = {http://www.cpqd.com.br/cadernosdetecnologia/Vol9_N2_jul_dez_2013/artigo1.html},
}
2. Bob as the core framework used to run the experiments::
@inproceedings{Anjos_ACMMM_2012,
author = {A. Anjos and L. El Shafey and R. Wallace and M. G\"unther and C. McCool and S. Marcel},
title = {Bob: a free signal processing and machine learning toolbox for researchers},
year = {2012},
month = oct,
booktitle = {20th ACM Conference on Multimedia Systems (ACMMM), Nara, Japan},
publisher = {ACM Press},
url = {http://publications.idiap.ch/downloads/papers/2012/Anjos_Bob_ACMMM12.pdf},
}
Overview
--------
This database was designed to provide data that was recorded in a natural way,
using various devices in different environments. Hence, algorithms that
perform well on this database are expected to be suitable for other real-world
applications that do not require a predefined audio/video recording setup.
Database participants were selected among employees of CPqD Foundation who
volunteered to make recordings. A unique ID was assigned for each participant,
composed by a prefix (M for male and F for female) followed by a 4-digit number
(odd for males and even for females). Each participant recorded up to five
sessions, with a time lapse of at least 10 days between sessions.
Sessions consisted of 27 recorded sentences, whose content was specified in a
script. Each sentence was recorded on three different devices types:
* Laptops (audio and video content);
* Smartphones (audio and video content);
* Phone calls (only audio).
For each device type, a set of devices was used, as specified below:
* Laptops:
- Compaq 510 with embedded mic and camera;
- Toshiba with USB Logitech QuickCam Pro 9000 webcam;
- DELL Latitude embedded mic and camera.
* Smartphones:
- Samsung Galaxy S II;
- Apple iPhone 4;
- Apple iPhone 4.
* Phone calls:
- landline phone call;
- personal mobile phone call.
Recordings were made in three environments with different characteristics:
garden, restaurant (public indoor) and office. The idea behind this strategy
was to exploit the influence of environmental noise in audio recordings and the
effect of illumination and background conditions in the video recordings.
Since the database includes recordings captured on different devices of
different types and in different environments, it allows a large number of
experimental setups.
Content
-------
The data collection followed a simple recording protocol that was replicated
for all sessions. For each session there was a corresponding script describing
the whole content to be recorded, as follows:
Text reading:
* a pre-defined text (extracted from the database's consent form);
* four phonetically rich sentences (randomly selected among 562 options);
* passphrase: three repetitions a single sentence (the same sentence for all
participants in all sessions).
Spontaneous speech:
* answers for generic questions (all participants answered all 15 questions
selected form a fixed set, distributed along the 5 sessions in random
order);
* a fake name;
* a fake address;
* a fake birthday date;
* a fake ID number;
* a fake phone number;
* two command words (all participants spoke 10 words along the 5 sessions in
random order).
Numbers, digits, time values and alphanumeric strings:
* a monetary amount between 10 and 10 000, randomly generated;
* a number between 10 and 1000, randomly generated;
* a number between 1000 and 10 million, randomly generated;
* three repetitions of a random digit sequence (first one read in a slow pace
and others naturally read);
* a fake credit card number;
* an alphanumeric string composed of 6 characters, randomly generated;
* a time value, selected among a predefined set with 181 samples, equally
distributed among participants.
It is important to note that all content was recorded in Brazilian Portuguese
language.
BioCPqD Phase I database provides unbiased biometric verification protocols,
one for male and one for female participants, based on the MOBIO database
protocols. These protocols partition the database in three different groups: -
a Training set: used to train the parameters of algorithm to be tested, e.g.,
to create the projection matrix, Universal Background Models, etc.; - a
Development set: used to evaluate hyper-parameters of the tested algorithms; -
a Test set: used to evaluate the generalization performance of the tested
algorithms with previously unseen data.
Both development and test sets are further split into an enrollment subset
(used to enroll participants' models), and a probe set (whose files will be
tested against all participants' models).
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
#
# @author: Marcus de Assis Angeloni <massis@cpqd.com.br>
# @author: Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# @date: Wed Nov 12 13:00:00 2014
#
from .query import Database
from bob.db.verification.filelist.models import File, Client
__all__ = dir()
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
#
# @author: Marcus de Assis Angeloni <massis@cpqd.com.br>
# @author: Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# @date: Wed Nov 12 13:00:00 2014
#
from .query import Database
from bob.db.verification.filelist.models import File, Client
__all__ = dir()
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
#
# Marcus de Assis Angeloni <massis@cpqd.com.br>
# Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# Wed Nov 12 13:00:00 2014
# Based on:
#
# Laurent El Shafey <laurent.el-shafey@idiap.ch>
# Fri Aug 23 16:51:41 CEST 2013
#
# Copyright (C) 2011-2012 Idiap Research Institute, Martigny, Switzerland
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Commands the CPqD database can respond to.
"""
import os
import sys
from bob.db.driver import Interface as BaseInterface
def dumplist(args):
"""Dumps lists of files based on your criteria"""
from .query import Database
db = Database()
r = db.objects(
purposes=args.purpose,
groups=args.group,
)
output = sys.stdout
if args.selftest:
from bob.db.utils import null
output = null()
for f in r:
output.write('%s\n' % f.make_path(directory=args.directory,extension=args.extension))
return 0
def checkfiles(args):
"""Checks existence of files based on your criteria"""
from .query import Database
db = Database()
r = db.objects()
# go through all files, check if they are available on the filesystem
good = []
bad = []
for f in r:
if os.path.exists(f.make_path(args.directory, args.extension)): good.append(f)
else: bad.append(f)
# report
output = sys.stdout
if args.selftest:
from bob.db.utils import null
output = null()
if bad:
for f in bad:
output.write('Cannot find file "%s"\n' % f.make_path(args.directory, args.extension))
output.write('%d files (out of %d) were not found at "%s"\n' % \
(len(bad), len(r), args.directory))
return 0
class Interface(BaseInterface):
def name(self):
return 'cpqd'
def version(self):
import pkg_resources # part of setuptools
return pkg_resources.require('bob.db.%s' % self.name())[0].version
def files(self):
return ()
def type(self):
return 'text'
def add_commands(self, parser):
from . import __doc__ as docs
subparsers = self.setup_parser(parser,
"CPqD database", docs)
import argparse
# the "dumplist" action
parser = subparsers.add_parser('dumplist', help=dumplist.__doc__)
parser.add_argument('-d', '--directory', default='', help="if given, this path will be prepended to every entry returned.")
parser.add_argument('-e', '--extension', default='', help="if given, this extension will be appended to every entry returned.")
parser.add_argument('-u', '--purpose', help="if given, this value will limit the output files to those designed for the given purposes.", choices=('enrol', 'probe', ''))
parser.add_argument('-g', '--group', help="if given, this value will limit the output files to those belonging to a particular protocolar group.", choices=('dev', 'eval', 'world', ''))
parser.add_argument('--self-test', dest="selftest", action='store_true', help=argparse.SUPPRESS)
parser.set_defaults(func=dumplist) #action
# the "checkfiles" action
parser = subparsers.add_parser('checkfiles', help=checkfiles.__doc__)
parser.add_argument('-l', '--list-directory', required=True, help="The directory which contains the file lists.")
parser.add_argument('-d', '--directory', dest="directory", default='', help="if given, this path will be prepended to every entry returned.")
parser.add_argument('-e', '--extension', dest="extension", default='', help="if given, this extension will be appended to every entry returned.")
parser.add_argument('--self-test', dest="selftest", action='store_true', help=argparse.SUPPRESS)
parser.set_defaults(func=checkfiles) #action
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
#
# Marcus de Assis Angeloni <massis@cpqd.com.br>
# Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# Wed Nov 12 13:00:00 2014
# Based on:
#
# Laurent El Shafey <laurent.el-shafey@idiap.ch>
# Fri Aug 23 16:51:41 CEST 2013
#
# Copyright (C) 2011-2012 Idiap Research Institute, Martigny, Switzerland
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Commands the CPqD database can respond to.
"""
import os
import sys
from bob.db.driver import Interface as BaseInterface
def dumplist(args):
"""Dumps lists of files based on your criteria"""
from .query import Database
db = Database()
r = db.objects(
purposes=args.purpose,
groups=args.group,
)
output = sys.stdout
if args.selftest:
from bob.db.utils import null
output = null()
for f in r:
output.write('%s\n' % f.make_path(directory=args.directory,extension=args.extension))
return 0
def checkfiles(args):
"""Checks existence of files based on your criteria"""
from .query import Database
db = Database()
r = db.objects()
# go through all files, check if they are available on the filesystem
good = []
bad = []
for f in r:
if os.path.exists(f.make_path(args.directory, args.extension)): good.append(f)
else: bad.append(f)
# report
output = sys.stdout
if args.selftest:
from bob.db.utils import null
output = null()
if bad:
for f in bad:
output.write('Cannot find file "%s"\n' % f.make_path(args.directory, args.extension))
output.write('%d files (out of %d) were not found at "%s"\n' % \
(len(bad), len(r), args.directory))
return 0
class Interface(BaseInterface):
def name(self):
return 'cpqd'
def version(self):
import pkg_resources # part of setuptools
return pkg_resources.require('bob.db.%s' % self.name())[0].version
def files(self):
return ()
def type(self):
return 'text'
def add_commands(self, parser):
from . import __doc__ as docs
subparsers = self.setup_parser(parser,
"CPqD database", docs)
import argparse
# the "dumplist" action
parser = subparsers.add_parser('dumplist', help=dumplist.__doc__)
parser.add_argument('-d', '--directory', default='', help="if given, this path will be prepended to every entry returned.")
parser.add_argument('-e', '--extension', default='', help="if given, this extension will be appended to every entry returned.")
parser.add_argument('-u', '--purpose', help="if given, this value will limit the output files to those designed for the given purposes.", choices=('enrol', 'probe', ''))
parser.add_argument('-g', '--group', help="if given, this value will limit the output files to those belonging to a particular protocolar group.", choices=('dev', 'eval', 'world', ''))
parser.add_argument('--self-test', dest="selftest", action='store_true', help=argparse.SUPPRESS)
parser.set_defaults(func=dumplist) #action
# the "checkfiles" action
parser = subparsers.add_parser('checkfiles', help=checkfiles.__doc__)
parser.add_argument('-l', '--list-directory', required=True, help="The directory which contains the file lists.")
parser.add_argument('-d', '--directory', dest="directory", default='', help="if given, this path will be prepended to every entry returned.")
parser.add_argument('-e', '--extension', dest="extension", default='', help="if given, this extension will be appended to every entry returned.")
parser.add_argument('--self-test', dest="selftest", action='store_true', help=argparse.SUPPRESS)
parser.set_defaults(func=checkfiles) #action
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# @author: Marcus de Assis Angeloni <massis@cpqd.com.br>
# @author: Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# @date: Wed Nov 12 13:00:00 2014
#
# Based on:
# @author: Elie Khoruy <Elie.Khoury@idiap.ch>
# @date: Thu Aug 22 17:49:19 CEST 2013
#
# Copyright (C) 2012-2013 Idiap Research Institute, Martigny, Switzerland
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import bob.db.verification.filelist
class Database(bob.db.verification.filelist.Database):
"""Wrapper class for the CPqD database for face/speaker recognition (http://www.cpqd.com.br/).
this class defines a simple protocol for training, dev and test and by splitting the audio/video files of the database in three main parts.
"""
def __init__(self, **kwargs):
# call base class constructor
from pkg_resources import resource_filename
lists = resource_filename(__name__, 'lists')
bob.db.verification.filelist.Database.__init__(self, lists,keep_read_lists_in_memory=False, **kwargs)
def annotations(self, file):
"""Reads the annotations for the given file id from file and returns them in a dictionary.
file
The ``File`` object for which the annotations should be read.
Return value
The annotations as a dictionary: {'reye':(re_y,re_x), 'leye':(le_y,le_x)}
"""
# return the annotations as read from file
if type(file) is str:
return bob.db.verification.utils.annotations.read_annotation_file(file, 'eyecenter')
else:
file_name = file.make_path(self.m_annotation_directory, self.m_annotation_extension)
return bob.db.verification.utils.annotations.read_annotation_file(file_name, 'eyecenter')
def tobjects(self, protocol=None, model_ids=None, groups=None):
#No TObjects
return []
def zobjects(self, protocol=None, groups=None):
#No TObjects
return []
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# @author: Marcus de Assis Angeloni <massis@cpqd.com.br>
# @author: Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
# @date: Wed Nov 12 13:00:00 2014
#
# Based on:
# @author: Elie Khoruy <Elie.Khoury@idiap.ch>
# @date: Thu Aug 22 17:49:19 CEST 2013
#
# Copyright (C) 2012-2013 Idiap Research Institute, Martigny, Switzerland
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import bob.db.verification.filelist
class Database(bob.db.verification.filelist.Database):
"""Wrapper class for the CPqD database for face/speaker recognition (http://www.cpqd.com.br/).
this class defines a simple protocol for training, dev and test and by splitting the audio/video files of the database in three main parts.
"""
def __init__(self, **kwargs):
# call base class constructor
from pkg_resources import resource_filename
lists = resource_filename(__name__, 'lists')
bob.db.verification.filelist.Database.__init__(self, lists,keep_read_lists_in_memory=False, **kwargs)
def annotations(self, file):
"""Reads the annotations for the given file id from file and returns them in a dictionary.
file
The ``File`` object for which the annotations should be read.
Return value
The annotations as a dictionary: {'reye':(re_y,re_x), 'leye':(le_y,le_x)}
"""
# return the annotations as read from file
if type(file) is str:
return bob.db.verification.utils.annotations.read_annotation_file(file, 'eyecenter')
else:
file_name = file.make_path(self.m_annotation_directory, self.m_annotation_extension)
return bob.db.verification.utils.annotations.read_annotation_file(file_name, 'eyecenter')
def tobjects(self, protocol=None, model_ids=None, groups=None):
#No TObjects
return []
def zobjects(self, protocol=None, groups=None):
#No TObjects
return []
This diff is collapsed.
; vim: set fileencoding=utf-8 :
; Marcus de Assis Angeloni <massis@cpqd.com.br>
; Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
; Wed Nov 12 13:00:00 2014
[buildout]
parts = scripts
develop = .
eggs = bob.db.verification.filelist
bob.db.cpqd
vanity
newest = false
[scripts]
recipe = bob.buildout:scripts
; vim: set fileencoding=utf-8 :
; Marcus de Assis Angeloni <massis@cpqd.com.br>
; Tiago de Freitas Pereira <tiagofrepereira@gmail.com>
; Wed Nov 12 13:00:00 2014
[buildout]
parts = scripts
develop = .
eggs = bob.db.verification.filelist
bob.db.cpqd
newest = false
[scripts]
recipe = bob.buildout:scripts
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment