@@ -2,24 +2,46 @@ This guide is a step-by-step introduction and advice on how to publish your
reproducible paper at Idiap.
> This document is a **collaborative** effort between researchers at Idiap.
> It is not meant as an exhaustive resource on software development or LaTeX writing.
> It assumes you're proficient in those matters. If something differs between this guide and your experience, please modify it accordingly after discussing with your team mates.
> It is not meant as an exhaustive resource on software development or LaTeX
> writing. It assumes you're proficient in those matters. If something differs
> between this guide and your experience, please modify it accordingly after
> discussing with your team mates.
## Data
If you have collected experimental data mentioned on your paper yourself, you need to publish your dataset separately, on Idiap’s data-distribution portal (DDP).
If you have collected experimental data mentioned on your paper yourself, you
need to publish your dataset separately, on Idiap’s data-distribution portal
(DDP).
To publish a new dataset on DDP, open a ticket on Idiap’s help-desk, and the gentle folk in the IT dept. will tell you exactly what to do. Just remember that the process usually takes at least a week, so try to plan things well in advance.
To publish a new dataset on DDP, open a ticket on Idiap’s help-desk, and the
gentle folk in the IT dept. will tell you exactly what to do. Just remember
that the process usually takes at least a week, so try to plan things well in
advance.
Remember to include enough information on the DDP release so that your database *can* be re-used for someone even if they don't have access to your software package (e.g., somebody doing experiments using R, Julia or Matlab).
Remember to include enough information on the DDP release so that your database
*can* be re-used for someone even if they don't have access to your software
package (e.g., somebody doing experiments using R, Julia or Matlab). This, for
example, should include protocol descriptions and annotations if that is
required to use your dataset.
## Software Package
Software packages for a paper should live within the [Bob Gitlab Group](https://gitlab.idiap.ch/bob) and should be named `bob.paper.conferenceYEAR_subject` (e.g. `bob.paper.icb2018_veinrec`). This package should include the source code and instructions so *an independent researcher* can reproduce the specific results on your paper. A paper package **is** a valid Python package and, as such, it may be distributed on [PyPI](https://pypi.python.org).
The code needs to be packaged in such a way that people downloading it can recreate the **exact software environment** in which you ran your experiments. If you do everything properly, someone else sitting in somewhere around the internet will be able to download your paper-package from https://pypi.python.org, unzip it and extract the files, and execute the following commands:
Software packages for a paper should live within the [Bob Gitlab
Group](https://gitlab.idiap.ch/bob) and should be named
`bob.paper.conferenceYEAR_subject` (e.g. `bob.paper.icb2018_veinrec`). This
package should include the source code and instructions so *an independent
researcher* can reproduce the specific results on your paper. A paper package
**is** a valid Python package and, as such, it may be distributed on
[PyPI](https://pypi.python.org).
The code needs to be packaged in such a way that people downloading it can
recreate the **exact software environment** in which you ran your experiments.
If you do everything properly, someone else sitting in somewhere around the
internet will be able to download your paper-package from
https://pypi.python.org, unzip it and extract the files, and execute the
After doing the buildout, the user should be able to re-run your experiments, provided he/she has access to the data.
After doing the buildout, the user should be able to re-run your experiments,
provided he/she has access to the data.
### Package Organization
There are several ways of publishing your code. Some people have taken the time to define a specific structure for organizing the code that makes it easy to publish it. This structure has been used to publish a lot of code based on Bob.
> Note: Explanations on this section assume you have already become familiar
> with Python packaging. Otherwise, please start from our [introductory guide](https://www.idiap.ch/software/bob/docs/bob/bob.extension/stable/pure_python.html)
This is the typical structure of a paper package:
There are several ways of publishing your code. Some people have taken the time
to define a specific structure for organizing the code that makes it easy to
publish it. This structure has been used to publish a lot of code based on Bob.
This is the *typical* structure of a paper package:
```text
bob/
...
...
@@ -54,19 +83,42 @@ setup.py
A quick overview of these files:
*`bob/` is the directory in which you put your source code. This includes scripts and whatever support files that you may need to implement all resources (tables and figures) on your paper. The files should be organized in subdirectories [matching your package name](https://www.idiap.ch/software/bob/docs/bob/bob.extension/stable/pure_python.html#anatomy-of-a-package).
*`.gitlab-ci.yml` is the file that controls the CI instructions for testing your paper installation and whatever else you deem necessary
*`COPYING` this is the license of your package. Normally, we set this to be GPLv3 as per Idiap advice. Just copy [this file](http://www.gnu.org/licenses/gpl.txt) and name it `COPYING` on the root of your package.
*`MANIFEST.in` should list files that you'd like to ship with your package
*`README.rst` contains basic information about your paper including citations users may need to refer to in case they decide to use your publication on their own work. It should also include installation instructions for the package and, eventually, information on how to re-run your code **and** produce the results on your paper
*`buildout.cfg` contains the basic recipe to create a working environment using your paper package
*`environment.yml` contains the precise list of conda packages required to re-build, from scratch, the work environment in which you know the paper will successfuly run **and** produce the same results you published
*`requirements.txt` contains the **direct** dependencies of your package (everything you `import` in **your** code). You don't need to include here *indirect* dependencies
*`setup.py` corresponds to the Python packaging instructions. It reads `requirements.txt` and defines what this package name is and how to install it. Read more about it [here](https://www.idiap.ch/software/bob/docs/bob/bob.extension/stable/pure_python.html#setting-up-your-package)
*`bob/` is the directory in which you put your source code. This includes
scripts and whatever support files that you may need to implement all
resources (tables and figures) on your paper. The files should be organized
*`.gitlab-ci.yml` is the file that controls the CI instructions for testing
your paper installation and whatever else you deem necessary
*`COPYING` this is the license of your package. Normally, we set this to be
GPLv3 as per Idiap advice. Just copy [this
file](http://www.gnu.org/licenses/gpl.txt) and name it `COPYING` on the root
of your package.
*`MANIFEST.in` should list non-pythonic files that you'd like to ship with
your package
*`README.rst` contains basic information about your paper including citations
users may need to refer to in case they decide to use your publication on
their own work. It should also include installation instructions for the
package and, eventually, information on how to re-run your code **and**
produce the results on your paper
*`buildout.cfg` contains the basic recipe to create a working environment
using your paper package
*`environment.yml` contains the precise list of conda packages required to
re-build, from scratch, the work environment in which you know the paper will
successfuly run **and** produce the same results you published
*`requirements.txt` contains the **direct** dependencies of your package
(everything you `import` in **your** code). You don't need to include here
*indirect* dependencies
*`setup.py` corresponds to the Python packaging instructions. It reads
`requirements.txt` and defines what this package name is and how to install
it. Read more about it [here](https://www.idiap.ch/software/bob/docs/bob/bob.extension/stable/pure_python.html#setting-up-your-package)
More complex packaging *may* be required in special cases. For those, please refer to our complete [Bob extension guide](https://www.idiap.ch/software/bob/docs/bob/bob.extension/stable/index.html).
Up-to-date templates for some of the above files may be found in [bob.admin](https://gitlab.idiap.ch/bob/bob.admin/tree/master/templates)
Up-to-date templates for some of the above files may be found in
[bob.admin](https://gitlab.idiap.ch/bob/bob.admin/tree/master/templates). Use
those when in doubt.
### Checking the README file
...
...
@@ -76,12 +128,20 @@ You can check the `README.rst` file for warnings and errors like this:
$ rst2html README.rst > /dev/null
```
This should print eventual formatting errors you may have. You want to fix these **before** uploading your package to PyPI or the description there will be unformatted.
This should print eventual formatting errors you may have. You want to fix
these **before** uploading your package to PyPI or the description there will
be unformatted.
### Continuous Integration
Continuous integration is the ability to test your package every time you commit something to it (actually, when you push your changes back to gitlab). We advise you create a `.gitlab-ci.yml` file that reproduces your installation instructions and tries, at least, to check if the scripts can run. It does not have to be sophisticated. Something along the lines should do the trick:
Continuous integration is the ability to test your package every time you
commit something to it (actually, when you push your changes back to gitlab).
We advise you create a `.gitlab-ci.yml` file that reproduces your installation
instructions and tries, at least, to check if the scripts can run. It does not
have to be sophisticated, like the ones we have for most Bob packages, just
functional enough to test the basics. Something along the lines should do the
trick:
```yaml
test:
...
...
@@ -107,34 +167,63 @@ test:
-docker
```
The `tags` section of this YAML file is important as it tells the Gitlab CI infrastructure where to run your tests. Make sure you go to the "Settings / CI/CD" of your software package in Gitlab and enable the corresponding runners.
The `tags` section of this YAML file is important as it tells the Gitlab CI
infrastructure where to run your tests. Make sure you go to the "Settings /
CI/CD" of your software package in Gitlab and enable the corresponding runners.
### Creating the `environment.yml` file
In order to ensure that the user of your source code can **exactly** reproduce your published experimental results, you want to ensure that they are working in the **same environment.** This means that the user should be working with the same versions of all the Python/Bob packages and package dependencies that you used when running your experiments. An easy way to achieve this is to **freeze** your working environment into an `environment.yml` file, from which the user can then re-create the same working environment.
Before we look at how to freeze a working environment, let's first consider how we would initially create the environment in which we wish to work. Environment creation is based on conda and can vary depending on which packages you need. For example, the `bob.paper.isba2018_entropy.env` environment for the `bob.paper.isba2018_entropy` paper package was created by executing the following command in the terminal:
In order to ensure that the user of your source code can **exactly** reproduce
your published experimental results, you want to ensure that they are working
in the **same environment.** This means that the user should be working with
the same versions of all the Python/Bob packages and package dependencies that
you used when running your experiments. An easy way to achieve this is to
**freeze** your working environment into an `environment.yml` file, from which
the user can then re-create the same working environment.
Before we look at how to freeze a working environment, let's first consider how
we would initially create the environment in which we wish to work.
Environment creation is based on conda and can vary depending on which packages
you need. For example, the `bob.paper.isba2018_entropy.env` environment for
the `bob.paper.isba2018_entropy` paper package was created by executing the
To work in this environment, you must then navigate to your working directory and activate the environment. Using the `bob.paper.isba2018_entropy` paper package as an example once again, this would be done by executing the following commands in your terminal:
To work in this environment, you must then navigate to your working directory
and activate the environment. Using the `bob.paper.isba2018_entropy` paper
package as an example once again, this would be done by executing the following
commands in your terminal:
```sh
$ cd bob.paper.isba2018_entropy
$ source activate bob.paper.isba2018_entropy.env
```
At this point, you are ready to freeze your environment with the following command:
At this point, you are ready to freeze your environment with the following
command:
```sh
$ conda env export> environment.yml
```
Now, open at your `environment.yml` file. If it contains `zc.buildout`, remove the corresponding version number so that, if the version is upgraded at a later point, the user can still do `buildout` in their re-created environment. You can also feel free to remove any packages in `environment.yml` that you know **for sure** are not needed by your paper package (if you are not sure, it's best not to remove anything). Finally, remove the "prefix" section of your `environment.yml` file, since the user of your package does not need to know the path to your working directory (anyway, their path will be different).
To make sure your frozen environment works as expected, test it on a different computer as follows, replacing `bob.paper.isba2018_entropy` with your package name and `bob.paper.isba2018_entropy.env` with the name of your previously-created environment:
Now, open at your `environment.yml` file. If it contains `zc.buildout` and
`setuptools`, remove the corresponding version number so that, if the version
is upgraded at a later point, the user can still do `buildout` in their
re-created environment. You can also feel free to remove any packages in
`environment.yml` that you know **for sure** are **not** needed by your paper
package (if you are not sure, it's best not to remove anything). Finally,
remove the "prefix" section of your `environment.yml` file, since the user of
your package does not need to know the path to your working directory (anyway,
their path will be different).
To make sure your frozen environment works as expected, test it on a different
computer as follows, replacing `bob.paper.isba2018_entropy` with your package
name and `bob.paper.isba2018_entropy.env` with the name of your
previously-created environment:
```sh
$ git clone https://gitlab.idiap.ch/bob/bob.paper.isba2018_entropy # download package from GitLab
...
...
@@ -145,22 +234,44 @@ $ buildout # generate the scripts necessary to run your experiments
$ ./bin/verify.py vera-wld # run your experiments
```
When you run your experiments in the created environment, your results should be the same as those you originally obtained.
When you run your experiments in the created environment, your results should
be the same as those you originally obtained.
Alternatively, you could simply test that your environment has been correctly created by incorporating the creation commands into your `.gitlab-ci.yml` file (see the "Continuous Integration" section, above). Once you have created and edited your `environment.yml` as explained, commit the changes to Git and push to your project repository on GitLab. If the pipeline for this commit succeeds, then your environment creation works as expected.
Alternatively, you could simply test that your environment has been correctly
created by incorporating the creation commands into your `.gitlab-ci.yml` file
(see the "Continuous Integration" section, above). Once you have created and
edited your `environment.yml` as explained, commit the changes to Git and push
to your project repository on GitLab. If the pipeline for this commit
succeeds, then your environment creation works as expected.
And that's it! All you need to do now is to include `environment.yml` in your
`MANIFEST.in` file to make sure that your environment file is packaged along
with your source code when creating a PyPI package. Note that it is also a
good idea to ensure that your environment creation and experiments work as
expected when downloading your paper package from PyPI as opposed to cloning it
from GitLab.
And that's it! All you need to do now is to include `environment.yml` in your `MANIFEST.in` file to make sure that your environment file is packaged along with your source code when creating a PyPI package. Note that it is also a good idea to ensure that your environment creation and experiments work as expected when downloading your paper package from PyPI as opposed to cloning it from GitLab.
### Software Disclosure Agreement
You should make your software package public. This normally has to go through a Software Disclosure agreement between you and Idiap. In order to kick-start the process open a help-desk ticket and go on from there. Include your supervisor in CC on that ticket, alongside with all involved partners. This process **can take up to a couple of weeks** to go through, as it may involve a software review.
You should make your software package public. This normally has to go through a
Software Disclosure agreement between you and Idiap. In order to kick-start the
process open a help-desk ticket and go on from there. Include your supervisor
in CC on that ticket, alongside with all involved partners. This process **can
take up to a couple of weeks** to go through, as it may involve a software
review.
### Publishing to PyPI
After your software package is sedimented and tested to work, you can publish it to PyPI. Before doing so, make sure it is public and **read the section entitled "Software Disclosure Agreement"** above.
After your software package is sedimented and tested to work, you can publish
it to PyPI. Before doing so, make sure it is public and **read the section
entitled "Software Disclosure Agreement"** above.
We recommend you use [Twine](https://pypi.python.org/pypi/twine) to upload your software package to PyPI. You may pip-install it on your local conda-development environment to do so. Once the `twine` binary is in place, just execute the following commands:
We recommend you use [Twine](https://pypi.python.org/pypi/twine) to upload your
software package to PyPI. You may pip-install it on your local
conda-development environment to do so. Once the `twine` binary is in place,
just execute the following commands:
```sh
#remember to use Python from your conda env
...
...
@@ -168,11 +279,16 @@ $ python setup.py sdist --formats zip
$ twine upload dist/*.zip
```
The `twine` command will require you enter a username and password for PyPI uploading. You *should* use our special account for this, so we keep track of all published packages. Ask people around for information.
The `twine` command will require you enter a username and password for PyPI
uploading. You *should* use our special account for this, so we keep track of
all published packages. Ask people around for information.
Once your package is uploaded to PyPI, you can paste the link from that server into your article. It should look like this: https://pypi.python.org/pypi/bob.paper.isba2018-entropy
Once your package is uploaded to PyPI, you can paste the link from that server
Do **not** include the version number on the link you paste on your article, or you won't be able to update the package in case of issues later on.
Do **not** include the version number on the link you paste on your article, or
you won't be able to update the package in case of issues later on.
### Example software packages:
...
...
@@ -183,9 +299,16 @@ Do **not** include the version number on the link you paste on your article, or
## Paper (LaTeX) Source Code
The source-code for your article should be in the [Biometrics Gitlab group](https://gitlab.idiap.ch/biometric/). If you don't have permissions to create a repository, ask for someone who does. The Gitlab project for a paper package should **not** be made public.
The source-code for your article should be in the [Biometrics Gitlab
group](https://gitlab.idiap.ch/biometric/). If you don't have permissions to
create a repository, ask for someone who does. The Gitlab project for a paper
package should **not** be made public.
LaTeX source code projects should be named `paper.conferenceYEAR.subject`. For
example, for the above software project name your LaTeX source code as
`paper.icb2018.veinrec`. The contents of your package should be simple and
include a `Makefile` to build the PDF of your paper from the sources.
LaTeX source code projects should be named `paper.conferenceYEAR.subject`. For example, for the above software project name your LaTeX source code as `paper.icb2018.veinrec`. The contents of your package should be simple and include a `Makefile` to build the PDF of your paper from the sources.
### Examples
...
...
@@ -194,7 +317,8 @@ LaTeX source code projects should be named `paper.conferenceYEAR.subject`. For e
### Continuous Integration
You can setup Gitlab CI to also test the build of your article at every push. Here is an example YAML file that does the trick:
You can setup Gitlab CI to also test the build of your article at every push.
Here is an example YAML file that does the trick:
```
stages:
...
...
@@ -220,20 +344,38 @@ macosx:
- beat-macosx
```
These CI instructions will try to build your paper in both Linux and MacOSX-based installations. It will preserve the PDF as build artifact you can download and check. The PDF will be available for up to one week after the build ends.
These CI instructions will try to build your paper in both Linux and
MacOSX-based installations. It will preserve the PDF as build artifact you can
download and check. The PDF will be available for up to one week after the
build ends, which is a nice plus for sharing.
Remember to activate the respective runners corresponding to the `tags` above on your Gitlab project `Settings / CI/CD` page.
Remember to activate the respective runners corresponding to the `tags` above
on your Gitlab project `Settings / CI/CD` page.
### Upload to the Idiap publications website
Your paper **must** be listed on the [Idiap publications portal](https://publications.idiap.ch). You want to do this so that you can list these contributions on your annual report later when you'll have to write it. There are two instances in which input to this website must occur:
1. When you **submit** your paper to a conference or journal, you should create a **Idiap-Internal-RR** (Research Report) that will remain *private* while you wait for your paper acceptance answer.
2. If and when your paper is *accepted*, then you must create **another** entry on that website that will become public. In this case, don't choose **Idiap-Internal-RR** anymore, but the appropriate entry that will make it **public**. Cross-reference the internal research report on that entry.
You may add the URL to your paper software-package on PyPI as an entry "note" on the article you're creating on the Idiap website. Remember to add **all** projects from which your work has received grants from at the appropriate form entry.
*Carefully* act on this website when uploading your contributions. Public entries cannot be easily undone as the system synchronizes automatically with other publication portals in Switzerland.
Once you have a link on the Idiap publications website, share this link with your supervisor and other parties involved.
\ No newline at end of file
Your paper **must** be listed on the [Idiap publications
portal](https://publications.idiap.ch). You want to do this so that you can
list these contributions on your annual report later when you'll have to write
it. There are two instances in which input to this website must occur:
1. When you **submit** your paper to a conference or journal, you should create
a **Idiap-Internal-RR** (Research Report) that will remain *private* while
you wait for your paper acceptance answer.
2. If and when your paper is *accepted*, then you must create **another** entry
on that website that will become public. In this case, don't choose
**Idiap-Internal-RR** anymore, but the appropriate entry that will make it
**public**. Cross-reference the internal research report on that entry.
You may add the URL to your paper software-package on PyPI as an entry "note"
on the article you're creating on the Idiap website. Remember to add **all**
projects from which your work has received grants from at the appropriate form
entry.
*Carefully* act on this website when uploading your contributions. Public
entries cannot be easily undone as the system synchronizes automatically with
other publication portals in Switzerland.
Once you have a link on the Idiap publications website, share this link with