Skip to content
Snippets Groups Projects
Verified Commit bed15697 authored by Yannick DAYER's avatar Yannick DAYER
Browse files

doc(doctest): fix output of xarray showing sizes.

parent 751c8154
No related branches found
No related tags found
No related merge requests found
Pipeline #87989 passed
......@@ -91,12 +91,12 @@ samples in an :any:`xarray.Dataset` using :any:`dask.array.Array`'s:
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 6kB
Dimensions: (sample: 150, dim_0: 4)
Dimensions without coordinates: sample, dim_0
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, dim_0) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, dim_0) float64 5kB dask.array<chunksize=(50, 4), meta=np.ndarray>
You can see here that our ``samples`` were converted to a dataset of dask
arrays. The dataset is made of two *dimensions*: ``sample`` and ``dim_0``. We
......@@ -118,12 +118,12 @@ about ``data`` in our samples:
>>> meta = xr.DataArray(samples[0].data, dims=("feature"))
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3, meta=meta)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 6kB
Dimensions: (sample: 150, feature: 4)
Dimensions without coordinates: sample, feature
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 5kB dask.array<chunksize=(50, 4), meta=np.ndarray>
Now, we want to build a pipeline that instead of numpy arrays, processes this
dataset instead. We can do that with our :any:`DatasetPipeline`. A dataset
......@@ -170,12 +170,12 @@ output of ``lda.decision_function``.
>>> ds = pipeline.decision_function(dataset)
>>> ds # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 5kB
Dimensions: (sample: 150, c: 3)
Dimensions without coordinates: sample, c
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, c) float64 dask.array<chunksize=(50, 3), meta=np.ndarray>
target (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, c) float64 4kB dask.array<chunksize=(50, 3), meta=np.ndarray>
To get the results as numpy arrays you can call ``.compute()`` on xarray
or dask objects:
......@@ -183,12 +183,12 @@ or dask objects:
.. doctest::
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 5kB
Dimensions: (sample: 150, c: 3)
Dimensions without coordinates: sample, c
Data variables:
target (sample) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2 2
data (sample, c) float64 28.42 -15.84 -59.68 20.69 ... -57.81 3.79 6.92
target (sample) int64 1kB 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
data (sample, c) float64 4kB 28.42 -15.84 -59.68 ... -57.81 3.79 6.92
Our operations were not lazy here (you can't see in the docs that it was not
......@@ -222,12 +222,12 @@ For new and unknown dimension sizes use `np.nan`.
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 5kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, class) float64 dask.array<chunksize=(50, 3), meta=np.ndarray>
target (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, class) float64 4kB dask.array<chunksize=(50, 3), meta=np.ndarray>
This time nothing was computed. We can get the results by calling
......@@ -236,12 +236,12 @@ This time nothing was computed. We can get the results by calling
.. doctest::
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 5kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2 2
data (sample, class) float64 28.42 -15.84 -59.68 ... -57.81 3.79 6.92
target (sample) int64 1kB 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
data (sample, class) float64 4kB 28.42 -15.84 -59.68 ... 3.79 6.92
>>> ds.data.data.visualize(format="svg") # doctest: +SKIP
In the visualization of the dask graph below, you can see that dask is only
......@@ -274,13 +274,13 @@ features. Let's add the ``key`` metadata to our dataset first:
>>> meta = xr.DataArray(samples[0].data, dims=("feature"))
>>> dataset = bob.pipelines.xr.samples_to_dataset(samples, npartitions=3, meta=meta)
>>> dataset # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 7kB
Dimensions: (sample: 150, feature: 4)
Dimensions without coordinates: sample, feature
Data variables:
target (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
key (sample) int64 dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 dask.array<chunksize=(50, 4), meta=np.ndarray>
target (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
key (sample) int64 1kB dask.array<chunksize=(50,), meta=np.ndarray>
data (sample, feature) float64 5kB dask.array<chunksize=(50, 4), meta=np.ndarray>
.. testsetup::
......@@ -314,13 +314,13 @@ features:
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 6kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
data (sample, class) float64 28.42 -15.84 -59.68 ... -57.81 3.79 6.92
target (sample) int64 1kB 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 1kB 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
data (sample, class) float64 4kB 28.42 -15.84 -59.68 ... 3.79 6.92
Now if you repeat the operations, the checkpoints will be used:
......@@ -328,13 +328,13 @@ Now if you repeat the operations, the checkpoints will be used:
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 6kB
Dimensions: (sample: 150, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
data (sample, class) float64 28.42 -15.84 -59.68 ... -57.81 3.79 6.92
target (sample) int64 1kB 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 1kB 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
data (sample, class) float64 4kB 28.42 -15.84 -59.68 ... 3.79 6.92
>>> ds.data.data.visualize(format="svg") # doctest: +SKIP
......@@ -388,13 +388,13 @@ Now in our pipeline, we want to drop ``nan`` samples after PCA transformations:
... )
>>> ds = pipeline.fit(dataset).decision_function(dataset)
>>> ds.compute() # doctest: +NORMALIZE_WHITESPACE
<xarray.Dataset>
<xarray.Dataset> Size: 3kB
Dimensions: (sample: 75, class: 3)
Dimensions without coordinates: sample, class
Data variables:
target (sample) int64 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 1 3 5 7 9 11 13 15 ... 137 139 141 143 145 147 149
data (sample, class) float64 21.74 -13.45 -54.81 ... -58.76 4.178 8.07
target (sample) int64 600B 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2 2
key (sample) int64 600B 1 3 5 7 9 11 13 ... 137 139 141 143 145 147 149
data (sample, class) float64 2kB 21.74 -13.45 -54.81 ... 4.178 8.07
You can see that we have 75 samples now instead of 150 samples. The
``dataset_map`` option is generic. You can apply any operation in this function.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment