Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
bob.bio.face
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
bob
bob.bio.face
Commits
e9d32305
Commit
e9d32305
authored
4 years ago
by
Tiago de Freitas Pereira
Browse files
Options
Downloads
Patches
Plain Diff
Inject samples example
parent
7488d20e
No related branches found
No related tags found
1 merge request
!112
Feature extractors
Pipeline
#51281
failed
4 years ago
Stage: build
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
notebooks/inject_samples.ipynb
+350
-0
350 additions, 0 deletions
notebooks/inject_samples.ipynb
with
350 additions
and
0 deletions
notebooks/inject_samples.ipynb
0 → 100644
+
350
−
0
View file @
e9d32305
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Injecting extra samples in vanilla biometrics protocols\n",
"\n",
"Sometimes our experiments go beyond \"simple\" database protocols.\n",
"Sometimes we just want to analyze the impact of some extra samples in our experiments without writing a whole dataset intergace for that.\n",
"\n",
"This notebook shows how to \"inject\" samples that doesn't belong to any protocol to some existing protocol.\n",
"We'll show case how to inject samples to perform score normalization.\n",
"\n",
"## Preparing the database\n",
"\n",
"We'll show case how to perform this injection using the MEDS dataset."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"dask_client = None\n",
"\n",
"OUTPUT_PATH = \"\"\n",
"PATH_INJECTED_DATA = \"\"\n",
"\n",
"\n",
"##### CHANGE YOUR DATABASE HERE\n",
"from bob.bio.face.database import MEDSDatabase\n",
"\n",
"database = MEDSDatabase(protocol=\"verification_fold1\")\n",
"\n",
"# Fetching the keys\n",
"#references = database.zprobes()[0].references\n",
"references = database.probes(group=\"eval\")[0].references + database.probes(group=\"dev\")[0].references\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading samples that will be injected\n",
"\n",
"Here we'll inject samples for znorm and tnorm"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# PATH\n",
"import os\n",
"import functools\n",
"import bob.io.base\n",
"# Fetching real data\n",
"#treferences = database.treferences()\n",
"#zprobes = database.zprobes()\n",
"\n",
"eyes_annotations={'leye': (61, 120),\n",
" 'reye': (61, 63)}\n",
"\n",
"\n",
"treferences_lst = [\"0/0_ethnicity_0.png\",\n",
" \"0/0_ethnicity_7.png\"]\n",
"\n",
"zprobes_lst = [\"1/1_ethnicity_0.png\",\n",
" \"1/1_ethnicity_7.png\"]\n",
"\n",
"from bob.pipelines import Sample, DelayedSample, SampleSet\n",
"\n",
"# Converting every element in a list in a sample set\n",
"def list_to_sampleset(lst, base_path, eyes_annotations, references):\n",
" sample_sets = []\n",
" for i,l in enumerate(lst):\n",
" sample = DelayedSample(functools.partial(bob.io.base.load,os.path.join(base_path,l)),\n",
" key=l,\n",
" reference_id=str(i),\n",
" annotations=eyes_annotations\n",
" )\n",
" sset = SampleSet(samples=[sample],\n",
" key=l,\n",
" reference_id=str(i),\n",
" references=references)\n",
"\n",
" sample_sets.append(sset)\n",
" return sample_sets\n",
"\n",
"\n",
"treferences = list_to_sampleset(treferences_lst, PATH_INJECTED_DATA,eyes_annotations, references=None)\n",
"zprobes = list_to_sampleset(zprobes_lst, PATH_INJECTED_DATA, eyes_annotations, references=references)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Preparing the pipeline\n",
"\n",
"Here we are using the arcface from insight face (https://github.com/deepinsight/insightface).\n",
"Feel free to change it by looking at (`bob.bio.face.embeddings`)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pipeline(steps=[('ToDaskBag', ToDaskBag(partition_size=200)),\n",
" ('samplewrapper-1',\n",
" DaskWrapper(estimator=CheckpointWrapper(estimator=SampleWrapper(estimator=FaceCrop(annotator=BobIpMTCNN(),\n",
" cropped_image_size=(112,\n",
" 112),\n",
" cropped_positions={'leye': (55,\n",
" 81),\n",
" 'reye': (55,\n",
" 42)}),\n",
" fit_extra_arguments=(),\n",
" transform_extra_arguments=(('annotations',\n",
" 'annotations'),)),\n",
" fe...\n",
" save_func=<function save at 0x7fccf501c560>))),\n",
" ('samplewrapper-2',\n",
" DaskWrapper(estimator=CheckpointWrapper(estimator=SampleWrapper(estimator=ArcFaceInsightFace_LResNet100(),\n",
" fit_extra_arguments=(),\n",
" transform_extra_arguments=()),\n",
" features_dir='/idiap/temp/tpereira/inject-example/samplewrapper-2',\n",
" load_func=<function load at 0x7fccf501c3b0>,\n",
" save_func=<function save at 0x7fccf501c560>)))])\n"
]
}
],
"source": [
"import os\n",
"from bob.bio.base.pipelines.vanilla_biometrics import checkpoint_vanilla_biometrics\n",
"from bob.bio.base.pipelines.vanilla_biometrics import dask_vanilla_biometrics\n",
"from bob.bio.base.pipelines.vanilla_biometrics import ZTNormPipeline, ZTNormCheckpointWrapper\n",
"from bob.bio.base.pipelines.vanilla_biometrics import CSVScoreWriter\n",
"\n",
"from bob.bio.face.embeddings.mxnet import arcface_insightFace_lresnet100\n",
"pipeline = arcface_insightFace_lresnet100(annotation_type=database.annotation_type,\n",
" fixed_positions=None,\n",
" memory_demanding=False)\n",
"\n",
"\n",
"## SCORE WRITER\n",
"# Here we want the pipeline to write using METADATA\n",
"pipeline.score_writer = CSVScoreWriter(os.path.join(OUTPUT_PATH, \"./tmp\"))\n",
"\n",
"\n",
"# Agregating with checkpoint\n",
"pipeline = checkpoint_vanilla_biometrics(pipeline, OUTPUT_PATH)\n",
"\n",
"\n",
"#pipeline = dask_vanilla_biometrics(ZTNormCheckpointWrapper(ZTNormPipeline(pipeline), OUTPUT_PATH))\n",
"# AGGREGATING WITH ZTNORM\n",
"pipeline = ZTNormPipeline(pipeline)\n",
"pipeline.ztnorm_solver = ZTNormCheckpointWrapper(\n",
" pipeline.ztnorm_solver, os.path.join(OUTPUT_PATH, \"normed-scores\")\n",
")\n",
"pipeline = dask_vanilla_biometrics(pipeline, partition_size=200)\n",
"\n",
"print(pipeline.transformer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting the DASK client (optional step; do it if you want to use the grid)\n",
"\n",
"**HERE MAKE ABSOLUTELLY SURE THAT YOU DO `SETSHELL grid` BEFORE STARTING THE NOTEBOOK**\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client\n",
"from bob.pipelines.distributed.sge import SGEMultipleQueuesCluster\n",
"\n",
"cluster = SGEMultipleQueuesCluster(min_jobs=1)\n",
"dask_client = Client(cluster)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example, we consider 10 samples from this database and extract features for these samples:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running the vanilla Biometrics"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"def post_process_scores(pipeline, scores, path):\n",
" written_scores = pipeline.write_scores(scores)\n",
" return pipeline.post_process(written_scores, path) \n",
"\n",
"def _build_filename(score_file_name, suffix):\n",
" return os.path.join(score_file_name, suffix)\n",
"\n",
"from dask.delayed import Delayed\n",
"import dask.bag\n",
"def compute_scores(result, dask_client):\n",
" if isinstance(result, Delayed) or isinstance(result, dask.bag.Bag):\n",
" if dask_client is not None:\n",
" result = result.compute(scheduler=dask_client)\n",
" else:\n",
" print(\"`dask_client` not set. Your pipeline will run locally\")\n",
" result = result.compute(scheduler=\"single-threaded\")\n",
" return result\n",
"\n",
"background_model_samples = database.background_model_samples()\n",
"for group in [\"dev\",\"eval\"]: \n",
"\n",
" score_file_name = os.path.join(OUTPUT_PATH, f\"scores-{group}\")\n",
" biometric_references = database.references(group=group)\n",
" probes = database.probes(group=group)\n",
" \n",
" (\n",
" raw_scores,\n",
" z_normed_scores,\n",
" t_normed_scores,\n",
" zt_normed_scores,\n",
" s_normed_scores,\n",
" ) = pipeline(\n",
" background_model_samples,\n",
" biometric_references,\n",
" probes,\n",
" zprobes,\n",
" treferences,\n",
" allow_scoring_with_all_biometric_references=True,\n",
" ) \n",
" \n",
" \n",
" \n",
"\n",
" # Running RAW_SCORES\n",
"\n",
" raw_scores = post_process_scores(\n",
" pipeline, raw_scores, _build_filename(score_file_name, \"raw_scores\")\n",
" )\n",
" _ = compute_scores(raw_scores, dask_client)\n",
"\n",
" # Z-SCORES\n",
" z_normed_scores = post_process_scores(\n",
" pipeline,\n",
" z_normed_scores,\n",
" _build_filename(score_file_name, \"z_normed_scores\"),\n",
" )\n",
" _ = compute_scores(z_normed_scores, dask_client)\n",
"\n",
" # T-SCORES\n",
" t_normed_scores = post_process_scores(\n",
" pipeline,\n",
" t_normed_scores,\n",
" _build_filename(score_file_name, \"t_normed_scores\"),\n",
" )\n",
" _ = compute_scores(t_normed_scores, dask_client)\n",
"\n",
" # S-SCORES\n",
" s_normed_scores = post_process_scores(\n",
" pipeline,\n",
" s_normed_scores,\n",
" _build_filename(score_file_name, \"s_normed_scores\"),\n",
" )\n",
" _ = compute_scores(s_normed_scores, dask_client)\n",
"\n",
" # ZT-SCORES\n",
" zt_normed_scores = post_process_scores(\n",
" pipeline,\n",
" zt_normed_scores,\n",
" _build_filename(score_file_name, \"zt_normed_scores\"),\n",
" )\n",
" _ = compute_scores(zt_normed_scores, dask_client)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following cells, we convert the extracted features to `numpy.array` and check the size of features."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# KILL THE SGE WORKERS\n",
"dask_client.shutdown()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id: tags:
# Injecting extra samples in vanilla biometrics protocols
Sometimes our experiments go beyond "simple" database protocols.
Sometimes we just want to analyze the impact of some extra samples in our experiments without writing a whole dataset intergace for that.
This notebook shows how to "inject" samples that doesn't belong to any protocol to some existing protocol.
We'll show case how to inject samples to perform score normalization.
## Preparing the database
We'll show case how to perform this injection using the MEDS dataset.
%% Cell type:code id: tags:
```
python
dask_client
=
None
OUTPUT_PATH
=
""
PATH_INJECTED_DATA
=
""
##### CHANGE YOUR DATABASE HERE
from
bob.bio.face.database
import
MEDSDatabase
database
=
MEDSDatabase
(
protocol
=
"
verification_fold1
"
)
# Fetching the keys
#references = database.zprobes()[0].references
references
=
database
.
probes
(
group
=
"
eval
"
)[
0
].
references
+
database
.
probes
(
group
=
"
dev
"
)[
0
].
references
```
%% Cell type:markdown id: tags:
# Loading samples that will be injected
Here we'll inject samples for znorm and tnorm
%% Cell type:code id: tags:
```
python
# PATH
import
os
import
functools
import
bob.io.base
# Fetching real data
#treferences = database.treferences()
#zprobes = database.zprobes()
eyes_annotations
=
{
'
leye
'
:
(
61
,
120
),
'
reye
'
:
(
61
,
63
)}
treferences_lst
=
[
"
0/0_ethnicity_0.png
"
,
"
0/0_ethnicity_7.png
"
]
zprobes_lst
=
[
"
1/1_ethnicity_0.png
"
,
"
1/1_ethnicity_7.png
"
]
from
bob.pipelines
import
Sample
,
DelayedSample
,
SampleSet
# Converting every element in a list in a sample set
def
list_to_sampleset
(
lst
,
base_path
,
eyes_annotations
,
references
):
sample_sets
=
[]
for
i
,
l
in
enumerate
(
lst
):
sample
=
DelayedSample
(
functools
.
partial
(
bob
.
io
.
base
.
load
,
os
.
path
.
join
(
base_path
,
l
)),
key
=
l
,
reference_id
=
str
(
i
),
annotations
=
eyes_annotations
)
sset
=
SampleSet
(
samples
=
[
sample
],
key
=
l
,
reference_id
=
str
(
i
),
references
=
references
)
sample_sets
.
append
(
sset
)
return
sample_sets
treferences
=
list_to_sampleset
(
treferences_lst
,
PATH_INJECTED_DATA
,
eyes_annotations
,
references
=
None
)
zprobes
=
list_to_sampleset
(
zprobes_lst
,
PATH_INJECTED_DATA
,
eyes_annotations
,
references
=
references
)
```
%% Cell type:markdown id: tags:
## Preparing the pipeline
Here we are using the arcface from insight face (https://github.com/deepinsight/insightface).
Feel free to change it by looking at (
`bob.bio.face.embeddings`
).
%% Cell type:code id: tags:
```
python
import
os
from
bob.bio.base.pipelines.vanilla_biometrics
import
checkpoint_vanilla_biometrics
from
bob.bio.base.pipelines.vanilla_biometrics
import
dask_vanilla_biometrics
from
bob.bio.base.pipelines.vanilla_biometrics
import
ZTNormPipeline
,
ZTNormCheckpointWrapper
from
bob.bio.base.pipelines.vanilla_biometrics
import
CSVScoreWriter
from
bob.bio.face.embeddings.mxnet
import
arcface_insightFace_lresnet100
pipeline
=
arcface_insightFace_lresnet100
(
annotation_type
=
database
.
annotation_type
,
fixed_positions
=
None
,
memory_demanding
=
False
)
## SCORE WRITER
# Here we want the pipeline to write using METADATA
pipeline
.
score_writer
=
CSVScoreWriter
(
os
.
path
.
join
(
OUTPUT_PATH
,
"
./tmp
"
))
# Agregating with checkpoint
pipeline
=
checkpoint_vanilla_biometrics
(
pipeline
,
OUTPUT_PATH
)
#pipeline = dask_vanilla_biometrics(ZTNormCheckpointWrapper(ZTNormPipeline(pipeline), OUTPUT_PATH))
# AGGREGATING WITH ZTNORM
pipeline
=
ZTNormPipeline
(
pipeline
)
pipeline
.
ztnorm_solver
=
ZTNormCheckpointWrapper
(
pipeline
.
ztnorm_solver
,
os
.
path
.
join
(
OUTPUT_PATH
,
"
normed-scores
"
)
)
pipeline
=
dask_vanilla_biometrics
(
pipeline
,
partition_size
=
200
)
print
(
pipeline
.
transformer
)
```
%% Output
Pipeline(steps=[('ToDaskBag', ToDaskBag(partition_size=200)),
('samplewrapper-1',
DaskWrapper(estimator=CheckpointWrapper(estimator=SampleWrapper(estimator=FaceCrop(annotator=BobIpMTCNN(),
cropped_image_size=(112,
112),
cropped_positions={'leye': (55,
81),
'reye': (55,
42)}),
fit_extra_arguments=(),
transform_extra_arguments=(('annotations',
'annotations'),)),
fe...
save_func=<function save at 0x7fccf501c560>))),
('samplewrapper-2',
DaskWrapper(estimator=CheckpointWrapper(estimator=SampleWrapper(estimator=ArcFaceInsightFace_LResNet100(),
fit_extra_arguments=(),
transform_extra_arguments=()),
features_dir='/idiap/temp/tpereira/inject-example/samplewrapper-2',
load_func=<function load at 0x7fccf501c3b0>,
save_func=<function save at 0x7fccf501c560>)))])
%% Cell type:markdown id: tags:
## Setting the DASK client (optional step; do it if you want to use the grid)
**HERE MAKE ABSOLUTELLY SURE THAT YOU DO `SETSHELL grid` BEFORE STARTING THE NOTEBOOK**
%% Cell type:code id: tags:
```
python
from
dask.distributed
import
Client
from
bob.pipelines.distributed.sge
import
SGEMultipleQueuesCluster
cluster
=
SGEMultipleQueuesCluster
(
min_jobs
=
1
)
dask_client
=
Client
(
cluster
)
```
%% Cell type:markdown id: tags:
As an example, we consider 10 samples from this database and extract features for these samples:
%% Cell type:markdown id: tags:
## Running the vanilla Biometrics
%% Cell type:code id: tags:
```
python
import
os
def
post_process_scores
(
pipeline
,
scores
,
path
):
written_scores
=
pipeline
.
write_scores
(
scores
)
return
pipeline
.
post_process
(
written_scores
,
path
)
def
_build_filename
(
score_file_name
,
suffix
):
return
os
.
path
.
join
(
score_file_name
,
suffix
)
from
dask.delayed
import
Delayed
import
dask.bag
def
compute_scores
(
result
,
dask_client
):
if
isinstance
(
result
,
Delayed
)
or
isinstance
(
result
,
dask
.
bag
.
Bag
):
if
dask_client
is
not
None
:
result
=
result
.
compute
(
scheduler
=
dask_client
)
else
:
print
(
"
`dask_client` not set. Your pipeline will run locally
"
)
result
=
result
.
compute
(
scheduler
=
"
single-threaded
"
)
return
result
background_model_samples
=
database
.
background_model_samples
()
for
group
in
[
"
dev
"
,
"
eval
"
]:
score_file_name
=
os
.
path
.
join
(
OUTPUT_PATH
,
f
"
scores-
{
group
}
"
)
biometric_references
=
database
.
references
(
group
=
group
)
probes
=
database
.
probes
(
group
=
group
)
(
raw_scores
,
z_normed_scores
,
t_normed_scores
,
zt_normed_scores
,
s_normed_scores
,
)
=
pipeline
(
background_model_samples
,
biometric_references
,
probes
,
zprobes
,
treferences
,
allow_scoring_with_all_biometric_references
=
True
,
)
# Running RAW_SCORES
raw_scores
=
post_process_scores
(
pipeline
,
raw_scores
,
_build_filename
(
score_file_name
,
"
raw_scores
"
)
)
_
=
compute_scores
(
raw_scores
,
dask_client
)
# Z-SCORES
z_normed_scores
=
post_process_scores
(
pipeline
,
z_normed_scores
,
_build_filename
(
score_file_name
,
"
z_normed_scores
"
),
)
_
=
compute_scores
(
z_normed_scores
,
dask_client
)
# T-SCORES
t_normed_scores
=
post_process_scores
(
pipeline
,
t_normed_scores
,
_build_filename
(
score_file_name
,
"
t_normed_scores
"
),
)
_
=
compute_scores
(
t_normed_scores
,
dask_client
)
# S-SCORES
s_normed_scores
=
post_process_scores
(
pipeline
,
s_normed_scores
,
_build_filename
(
score_file_name
,
"
s_normed_scores
"
),
)
_
=
compute_scores
(
s_normed_scores
,
dask_client
)
# ZT-SCORES
zt_normed_scores
=
post_process_scores
(
pipeline
,
zt_normed_scores
,
_build_filename
(
score_file_name
,
"
zt_normed_scores
"
),
)
_
=
compute_scores
(
zt_normed_scores
,
dask_client
)
```
%% Cell type:markdown id: tags:
In the following cells, we convert the extracted features to
`numpy.array`
and check the size of features.
%% Cell type:code id: tags:
```
python
# KILL THE SGE WORKERS
dask_client
.
shutdown
()
```
%% Cell type:code id: tags:
```
python
```
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment