Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
bob.bio.base
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
bob
bob.bio.base
Commits
e2fa582c
Commit
e2fa582c
authored
11 months ago
by
Yannick DAYER
Browse files
Options
Downloads
Plain Diff
Merge branch 'fix/empty-column' into 'master'
Scores loading fixes Closes
#194
and
#196
See merge request
!328
parents
9a790353
f996874d
No related branches found
No related tags found
1 merge request
!328
Scores loading fixes
Pipeline
#89061
passed
11 months ago
Stage: qa
Stage: test
Stage: doc
Stage: dist
Stage: deploy
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/bob/bio/base/score/load.py
+48
-30
48 additions, 30 deletions
src/bob/bio/base/score/load.py
with
48 additions
and
30 deletions
src/bob/bio/base/score/load.py
+
48
−
30
View file @
e2fa582c
...
@@ -2,14 +2,14 @@
...
@@ -2,14 +2,14 @@
# vim: set fileencoding=utf-8 :
# vim: set fileencoding=utf-8 :
# Mon 23 May 2011 16:23:05 CEST
# Mon 23 May 2011 16:23:05 CEST
"""
A set of utilities to load score files with different formats.
"""
A set of utilities to load score files with different formats.
"""
"""
import
csv
import
csv
import
logging
import
logging
import
os
import
os
import
tarfile
import
tarfile
from
collections
import
defaultdict
from
pathlib
import
Path
from
pathlib
import
Path
import
dask.dataframe
import
dask.dataframe
...
@@ -94,8 +94,8 @@ def four_column(filename):
...
@@ -94,8 +94,8 @@ def four_column(filename):
str: The claimed identity -- the client name of the model that was used in
str: The claimed identity -- the client name of the model that was used in
the comparison
the comparison
str: The real identity -- the client name of the probe that was used in
the
str: The real identity -- the client name of the probe that was used in
comparison
the
comparison
str: A label of the probe -- usually the probe file name, or the probe id
str: A label of the probe -- usually the probe file name, or the probe id
...
@@ -153,15 +153,19 @@ def get_split_dataframe(filename):
...
@@ -153,15 +153,19 @@ def get_split_dataframe(filename):
-------
-------
dataframe: negatives, contains the list of scores (and metadata) for which
dataframe: negatives, contains the list of scores (and metadata) for which
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id`` columns are
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id``
different. (see :ref:`bob.bio.base.pipeline_simple_advanced_features`)
columns are different. (see
:ref:`bob.bio.base.pipeline_simple_advanced_features`)
dataframe: positives, contains the list of scores (and metadata) for which
dataframe: positives, contains the list of scores (and metadata) for which
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id`` columns are
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id``
identical. (see :ref:`bob.bio.base.pipeline_simple_advanced_features`)
columns are identical. (see
:ref:`bob.bio.base.pipeline_simple_advanced_features`)
"""
"""
df
=
dask
.
dataframe
.
read_csv
(
filename
)
df
=
dask
.
dataframe
.
read_csv
(
filename
,
dtype
=
defaultdict
(
lambda
:
str
,
{
"
score
"
:
float
})
)
genuines
=
df
[
df
.
probe_subject_id
==
df
.
bio_ref_subject_id
]
genuines
=
df
[
df
.
probe_subject_id
==
df
.
bio_ref_subject_id
]
impostors
=
df
[
df
.
probe_subject_id
!=
df
.
bio_ref_subject_id
]
impostors
=
df
[
df
.
probe_subject_id
!=
df
.
bio_ref_subject_id
]
...
@@ -184,15 +188,19 @@ def split_csv_scores(filename, score_column: str = "score"):
...
@@ -184,15 +188,19 @@ def split_csv_scores(filename, score_column: str = "score"):
-------
-------
array: negatives, 1D float array containing the list of scores, for which
array: negatives, 1D float array containing the list of scores, for which
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id`` columns are
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id``
different. (see :ref:`bob.bio.base.pipeline_simple_advanced_features`)
columns are different. (see
:ref:`bob.bio.base.pipeline_simple_advanced_features`)
array: positives, 1D float array containing the list of scores, for which
array: positives, 1D float array containing the list of scores, for which
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id`` columns are
the fields of the ``bio_ref_subject_id`` and ``probe_subject_id``
identical. (see :ref:`bob.bio.base.pipeline_simple_advanced_features`)
columns are identical. (see
:ref:`bob.bio.base.pipeline_simple_advanced_features`)
"""
"""
df
=
dask
.
dataframe
.
read_csv
(
filename
)
df
=
dask
.
dataframe
.
read_csv
(
filename
,
dtype
=
defaultdict
(
lambda
:
str
,
{
"
score
"
:
float
})
)
genuines
=
df
[
df
.
probe_subject_id
==
df
.
bio_ref_subject_id
]
genuines
=
df
[
df
.
probe_subject_id
==
df
.
bio_ref_subject_id
]
impostors
=
df
[
df
.
probe_subject_id
!=
df
.
bio_ref_subject_id
]
impostors
=
df
[
df
.
probe_subject_id
!=
df
.
bio_ref_subject_id
]
...
@@ -262,8 +270,8 @@ def five_column(filename):
...
@@ -262,8 +270,8 @@ def five_column(filename):
str: A label for the model -- usually the model file name, or the model id
str: A label for the model -- usually the model file name, or the model id
str: The real identity -- the client name of the probe that was used in
the
str: The real identity -- the client name of the probe that was used in
comparison
the
comparison
str: A label of the probe -- usually the probe file name, or the probe id
str: A label of the probe -- usually the probe file name, or the probe id
...
@@ -346,7 +354,8 @@ def scores(filename, ncolumns=None):
...
@@ -346,7 +354,8 @@ def scores(filename, ncolumns=None):
Parameters:
Parameters:
filename: :py:class:`str`, ``file-like``:
filename: :py:class:`str`, ``file-like``:
The file object that will be opened with :py:func:`open_file` containing the scores.
The file object that will be opened with :py:func:`open_file` containing
the scores.
ncolumns: any
ncolumns: any
ignored
ignored
...
@@ -461,8 +470,8 @@ def load_score(filename, ncolumns=None, minimal=False, **kwargs):
...
@@ -461,8 +470,8 @@ def load_score(filename, ncolumns=None, minimal=False, **kwargs):
specifying the number of columns in the score file. If None is provided,
specifying the number of columns in the score file. If None is provided,
the number of columns will be guessed.
the number of columns will be guessed.
minimal (:py:class:`bool`, optional): If True, only loads ``claimed_id``,
``real_id``,
minimal (:py:class:`bool`, optional): If True, only loads ``claimed_id``,
and ``scores``.
``real_id``,
and ``scores``.
**kwargs: Keyword arguments passed to :py:func:`numpy.genfromtxt`
**kwargs: Keyword arguments passed to :py:func:`numpy.genfromtxt`
...
@@ -624,9 +633,10 @@ def _estimate_score_file_format(filename, ncolumns=None):
...
@@ -624,9 +633,10 @@ def _estimate_score_file_format(filename, ncolumns=None):
def
_iterate_score_file
(
filename
,
csv_score_column
:
str
=
"
score
"
):
def
_iterate_score_file
(
filename
,
csv_score_column
:
str
=
"
score
"
):
"""
Opens the score file
for reading
and yields the score file line
by line
in a tuple/list.
"""
Opens the score file and yields the score file line
s
in a tuple/list.
The last element of the line (which is the score) will be transformed to float, the other elements will be str
The last element of the line (which is the score) will be transformed to
float, the other elements will be str.
"""
"""
if
iscsv
(
filename
):
if
iscsv
(
filename
):
for
row
in
_iterate_csv_score_file
(
for
row
in
_iterate_csv_score_file
(
...
@@ -635,7 +645,7 @@ def _iterate_score_file(filename, csv_score_column: str = "score"):
...
@@ -635,7 +645,7 @@ def _iterate_score_file(filename, csv_score_column: str = "score"):
yield
[
yield
[
row
[
"
bio_ref_subject_id
"
],
row
[
"
bio_ref_subject_id
"
],
row
[
"
probe_subject_id
"
],
row
[
"
probe_subject_id
"
],
row
[
"
probe_
key
"
],
row
[
"
probe_
template_id
"
],
row
[
csv_score_column
],
row
[
csv_score_column
],
]
]
else
:
else
:
...
@@ -667,7 +677,9 @@ def _iterate_csv_score_file(filename, score_column: str = "score"):
...
@@ -667,7 +677,9 @@ def _iterate_csv_score_file(filename, score_column: str = "score"):
def
_split_scores
(
def
_split_scores
(
score_lines
,
real_id_index
,
claimed_id_index
=
0
,
score_index
=-
1
score_lines
,
real_id_index
,
claimed_id_index
=
0
,
score_index
=-
1
):
):
"""
Take the output of :py:func:`four_column` or :py:func:`five_column` and return negatives and positives.
"""
"""
Take the output of :py:func:`four_column` or :py:func:`five_column` and
return negatives and positives.
"""
positives
,
negatives
=
[],
[]
positives
,
negatives
=
[],
[]
for
line
in
score_lines
:
for
line
in
score_lines
:
which
=
(
which
=
(
...
@@ -687,7 +699,9 @@ def _split_cmc_scores(
...
@@ -687,7 +699,9 @@ def _split_cmc_scores(
claimed_id_index
=
0
,
claimed_id_index
=
0
,
score_index
=-
1
,
score_index
=-
1
,
):
):
"""
Takes the output of :py:func:`four_column` or :py:func:`five_column` and return cmc scores.
"""
"""
Takes the output of :py:func:`four_column` or :py:func:`five_column` and
return cmc scores.
"""
if
probe_name_index
is
None
:
if
probe_name_index
is
None
:
probe_name_index
=
real_id_index
+
1
probe_name_index
=
real_id_index
+
1
# extract positives and negatives
# extract positives and negatives
...
@@ -712,12 +726,16 @@ def _split_cmc_scores(
...
@@ -712,12 +726,16 @@ def _split_cmc_scores(
# get all scores in the desired format
# get all scores in the desired format
return
[
return
[
(
(
numpy
.
array
(
neg_dict
[
probe_name
],
numpy
.
float64
)
(
if
probe_name
in
neg_dict
numpy
.
array
(
neg_dict
[
probe_name
],
numpy
.
float64
)
else
None
,
if
probe_name
in
neg_dict
numpy
.
array
(
pos_dict
[
probe_name
],
numpy
.
float64
)
else
None
if
probe_name
in
pos_dict
),
else
None
,
(
numpy
.
array
(
pos_dict
[
probe_name
],
numpy
.
float64
)
if
probe_name
in
pos_dict
else
None
),
)
)
for
probe_name
in
probe_names
for
probe_name
in
probe_names
]
]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment