Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
bob.pipelines
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
bob
bob.pipelines
Commits
bed15697
Verified
Commit
bed15697
authored
1 year ago
by
Yannick DAYER
Browse files
Options
Downloads
Patches
Plain Diff
doc(doctest): fix output of xarray showing sizes.
parent
751c8154
No related branches found
No related tags found
No related merge requests found
Pipeline
#87989
passed
1 year ago
Stage: qa
Stage: test
Stage: doc
Stage: dist
Stage: deploy
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/xarray.rst
+34
-34
34 additions, 34 deletions
doc/xarray.rst
with
34 additions
and
34 deletions
doc/xarray.rst
+
34
−
34
View file @
bed15697
...
...
@@ -91,12 +91,12 @@ samples in an :any:`xarray.Dataset` using :any:`dask.array.Array`'s:
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 6kB
Dimensions: (sample: 150, dim_0: 4)
Dimensions without coordinates: sample, dim_0
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, dim_0) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, dim_0) float64
5kB
dask.array<chunksize=(50, 4), meta=np.ndarray>
You can see here that our ``samples`` were converted to a dataset of dask
arrays. The dataset is made of two *dimensions*: ``sample`` and ``dim_0``. We
...
...
@@ -118,12 +118,12 @@ about ``data`` in our samples:
>>> meta = xr.DataArray(samples[0].data, dims=("feature"))
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3, meta=meta)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 6kB
Dimensions: (sample: 150, feature: 4)
Dimensions without coordinates: sample, feature
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64
5kB
dask.array<chunksize=(50, 4), meta=np.ndarray>
Now, we want to build a pipeline that instead of numpy arrays, processes this
dataset instead. We can do that with our :any:`DatasetPipeline`. A dataset
...
...
@@ -170,12 +170,12 @@ output of ``lda.decision_function``.
>>> ds = pipeline.decision_function(dataset)
>>> ds # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 5kB
Dimensions: (sample: 150, c: 3)
Dimensions without coordinates: sample, c
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, c) float64 dask.array<chunksize=(50, 3), meta=np.ndarray>
target (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, c) float64
4kB
dask.array<chunksize=(50, 3), meta=np.ndarray>
To get the results as numpy arrays you can call ``.compute()`` on xarray
or dask objects:
...
...
@@ -183,12 +183,12 @@ or dask objects:
.. doctest::
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 5kB
Dimensions: (sample: 150, c: 3)
Dimensions without coordinates: sample, c
Data variables:
target (sample) int64
0
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
2
data (sample, c) float64 28.42 -15.84 -59.68
20.69
... -57.81 3.79 6.92
target (sample) int64
1kB
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
data (sample, c) float64
4kB
28.42 -15.84 -59.68 ... -57.81 3.79 6.92
Our operations were not lazy here (you can't see in the docs that it was not
...
...
@@ -222,12 +222,12 @@ For new and unknown dimension sizes use `np.nan`.
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 5kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, class) float64 dask.array<chunksize=(50, 3), meta=np.ndarray>
target (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, class) float64
4kB
dask.array<chunksize=(50, 3), meta=np.ndarray>
This time nothing was computed. We can get the results by calling
...
...
@@ -236,12 +236,12 @@ This time nothing was computed. We can get the results by calling
.. doctest::
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 5kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64
0
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
2
data (sample, class) float64 28.42 -15.84 -59.68 ...
-57.81
3.79 6.92
target (sample) int64
1kB
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
data (sample, class) float64
4kB
28.42 -15.84 -59.68 ... 3.79 6.92
>>> ds.data.data.visualize(format="svg") # doctest: +SKIP
In the visualization of the dask graph below, you can see that dask is only
...
...
@@ -274,13 +274,13 @@ features. Let's add the ``key`` metadata to our dataset first:
>>> meta = xr.DataArray(samples[0].data, dims=("feature"))
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3, meta=meta)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 7kB
Dimensions: (sample: 150, feature: 4)
Dimensions without coordinates: sample, feature
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
key (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
key (sample) int64
1kB
dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64
5kB
dask.array<chunksize=(50, 4), meta=np.ndarray>
.. testsetup::
...
...
@@ -314,13 +314,13 @@ features:
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 6kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64
0
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
2
key (sample) int64 0 1 2 3 4 5 6 7 ...
142
143 144 145 146 147 148 149
data (sample, class) float64 28.42 -15.84 -59.68 ...
-57.81
3.79 6.92
target (sample) int64
1kB
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64
1kB
0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
data (sample, class) float64
4kB
28.42 -15.84 -59.68 ... 3.79 6.92
Now if you repeat the operations, the checkpoints will be used:
...
...
@@ -328,13 +328,13 @@ Now if you repeat the operations, the checkpoints will be used:
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 6kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64
0
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
2
key (sample) int64 0 1 2 3 4 5 6 7 ...
142
143 144 145 146 147 148 149
data (sample, class) float64 28.42 -15.84 -59.68 ...
-57.81
3.79 6.92
target (sample) int64
1kB
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64
1kB
0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
data (sample, class) float64
4kB
28.42 -15.84 -59.68 ... 3.79 6.92
>>> ds.data.data.visualize(format="svg") # doctest: +SKIP
...
...
@@ -388,13 +388,13 @@ Now in our pipeline, we want to drop ``nan`` samples after PCA transformations:
... )
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset>
Size: 3kB
Dimensions: (sample: 75, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64
0
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
2
key (sample) int64 1 3 5 7 9 11 13
15
... 137 139 141 143 145 147 149
data (sample, class) float64 21.74 -13.45 -54.81 ...
-58.76
4.178 8.07
target (sample) int64
600B
0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64
600B
1 3 5 7 9 11 13 ... 137 139 141 143 145 147 149
data (sample, class) float64
2kB
21.74 -13.45 -54.81 ... 4.178 8.07
You can see that we have 75 samples now instead of 150 samples. The
``dataset_map`` option is generic. You can apply any operation in this function.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment