diff --git a/doc/install.rst b/doc/install.rst deleted file mode 100644 index 20b0921c3655ca3eb98f683856c38c24d191fc9f..0000000000000000000000000000000000000000 --- a/doc/install.rst +++ /dev/null @@ -1,145 +0,0 @@ - -============== - Installation -============== - -This section includes information for installing ``beat.editor`` a couple different ways. - -Method 1: Virtualbox Image (easiest) -==================================== - -This method is the easiest - one command will generate a new Virtualbox image with ``beat.editor`` installed. - -Requirements ------------- - -#. Working internet connection -#. `Virtualbox <https://www.virtualbox.org/>`_ installed and working (might need configuration) -#. `Packer <https://www.packer.io/>`_ installed -#. About 3G of disk space (at the time of writing, 2.7G needed) -#. Git - -Installation Steps ------------------- - -#. Clone the repository holding the scripts: :: - - $ git clone https://gitlab.idiap.ch/beat/beat.tutorial.git - -#. Checkout the Git branch containing the packer scripts: :: - - $ git checkout june-allies - -#. Change directory to the folder holding the scripts: :: - - $ cd beat.tutorial/packer_scripts - -#. Generate the Virtualbox image: :: - - $ packer-io build ubuntu-18.04-beat.json - -This will generate the OVF file in ``packer_scripts/output-virtualbox-iso``. - -To use this OVF file in the Virtualbox GUI, select "Import Appliance" under the "File" menu in Virtualbox and choose to import the OVF file. - -The username and password for the image are both ``beat``. - -The BEAT commands, ``beat`` and ``beat editor serve``, should be available in the shell in the VM. The Firefox browser is available, as is the Vim, Emacs, Nano, and Gedit editors. - -Method 2: Conda (easyish) -========================= - -This method integrates with one of your Conda environments' packages on your local machine, but has more strict requirements. - -Requirements ------------- - -#. Linux or MacOS (Docker features not available on MacOS for now) -#. Working internet connection -#. Useable Conda setup (see `Bob's documentation on it <https://www.idiap.ch/software/bob/docs/bob/bob/master/install.html>`_) -#. Firefox or Chrome browser -#. Docker installed and available to the current user (not strictly necessary) - -Installation Steps ------------------- - -#. Activate the Conda environment you will be adding ``beat.editor`` to - -#. Due to the current development state of this project you need to add these two Conda channels: :: - - $ conda config --env --add channels https://www.idiap.ch/software/beat/conda/label/beta - $ conda config --env --add channels https://www.idiap.ch/software/beat/conda - -Note: once this package released only the second channel will be required - -#. Install the ``beat.editor`` package: :: - - $ conda install beat.editor - -#. The BEAT commands, ``beat`` and ``beat editor serve``, should be available. - -Method 3: For developers -======================== - -This method is for developers looking to work on ``beat.editor``. - -Requirements ------------- - -#. Linux or MacOS -#. Working internet connection -#. Useable Conda setup (see `Bob's documentation on it <https://www.idiap.ch/software/bob/docs/bob/bob/master/install.html>`_) -#. Up-to-date Firefox and Chrome browsers (both need headless support, a relatively recent feature) -#. Docker installed and available to the current user -#. A relatively recent version of `NodeJS & NPM <http://nodejs.org/>`_. - -Installation Steps ------------------- - -#. Clone the ``beat.editor`` repository: :: - - $ git clone git@gitlab.idiap.ch:beat/beat.editor.git - -#. Clone the ``bob.admin`` repository, necessary for building a development Conda environment: :: - - $ git clone git@gitlab.idiap.ch:bob/bob.admin.git - -#. Use ``bob.admin`` to create a development Conda environment named ``beatedit`` for Python 3.6: :: - - $ conda activate base - $ conda install pyyaml # install a dependency that isn't fetched by default - $ cd beat.editor - $ ../bob.admin/conda/conda-bootstrap.py --overwrite --python=3.6 beatedit - -#. Activate the created environment and use the ``buildout`` command from ``bob.buildout`` in the new environment to generate the executables for ``beat.editor``: :: - - $ conda activate beatedit - $ buildout - -#. Note for development mode: Add the line ``dependent-scripts = true`` then use the ``buildout`` command to ensure to generate all the executables in the ``bin/`` - folder even ``beat`` for ``beat.editor``: :: - - $ buildout - -#. In ``bin/`` you should several new executables for the project. The Python setup part is done. - -#. Go into the root folder for the Javascript and install the dependencies via NPM: :: - - $ cd conda/js - $ npm install - -#. Test the JS to make sure everything is working properly: :: - - $ npm test - -#. Assuming the tests pass, you can start the development webpack server: :: - - $ npm start - -#. In a separate terminal tab/window, go back to the root of the ``beat.editor`` project and run the web server in development mode: :: - - $ ./bin/beat editor serve --dev - -#. You should now be able to go ``localhost:9101`` in your browser to see the ``beat.editor`` web app, now served by the webpack dev server. - -.. automodule:: beat.editor diff --git a/doc/user.rst b/doc/user.rst deleted file mode 100644 index 1787c8dc037f6ef41f1b9598e84ef755793da805..0000000000000000000000000000000000000000 --- a/doc/user.rst +++ /dev/null @@ -1,460 +0,0 @@ - -============ - User Guide -============ - -This section includes the user-facing guide for creating BEAT objects locally using ``beat.editor`` and ``beat.cmdline``, in the form of a tutorial. - -Requirements -============ - -#. A ``beat.editor`` installation: Please follow the Installation guide. -#. BEAT familiarity: This guide assumes you are somewhat familiar with what BEAT is and how it works. Please see the `BEAT User Guide <https://www.beat-eu.org/platform/static/guide/index.html>`_ and refer to it if you come across an unfamiliar BEAT term or concept. -#. A BEAT prefix: The BEAT prefix found in the ``prefix/`` folder in the ``beat.tutorial.prefix`` repository is used in the tutorial. If you are using the VM image, you will find this folder at ``~/beat.tutorial.prefix/prefix/``, otherwise you will need to download it yourself. -#. The following (Bob) packages installed in the same Conda environment as ``beat.editor`` (already installed in the VM): - - ``bob`` - - ``bob.db.iris`` - - ``bob.bio.base`` - - ``bob.bio.face`` - - ``ipdb`` - -The Workflow -============ - -There are a few things you have to do to create BEAT objects locally: - -#. Create & edit the metadata of BEAT objects -#. Edit the Python code for certain types of BEAT objects -#. Test, debug, visualize, and manage your BEAT objects - -Each of these steps is filled by a different tool: - -#. ``beat.editor`` edits metadata through a web app -#. You use your own editor of your choice to edit code -#. ``beat.cmdline`` does "the rest", letting you run & visualize experiments, manage the cache, debug, and much more (see that project's documentation for more) - -You might have a window setup like the following: - -.. image:: ./img/workflow_screenshot.png - -One of the terminals is for running the ``beat.editor`` server; one is for editing code and running ``beat.cmdline`` commands; the browser is for using the ``beat.editor`` web app. - -The Prefix -========== - -This has been referenced above but not explained. The prefix is a folder that holds all of your BEAT object data in a certain format. This folder is commonly named `prefix` but could be named anything. Whenever you run any ``beat.editor`` or ``beat.cmdline`` commands, run it in the parent folder of your prefix. If you don't, the system won't know where your BEAT objects are and will probably not do or show anything. - -A Sanity Check -============== - -Let's make sure your installation is working as intended. Run the following in a terminal (in the parent folder of your prefix!): :: - - $ beat exp list - -This lists all the experiments in your prefix, and should not be empty. - -.. warning:: - - You need your Conda environment active to access these tools & commands. If the ``beat`` command isn't found, make sure to run ``conda activate <environment name>``! - -Now, let's make sure ``beat.editor`` is working. Run the following: :: - - $ beat editor serve - -A new tab should open up in your browser showing this: - -.. image:: ./img/beat_editor_home.png - -The Iris LDA Experiment -======================= - -This tutorial will be based on the classic Iris LDA experiment. We'll start by explaining the problem, designing a solution, and seeing this solution in terms of BEAT. We'll then run the experiment, analyze it, and debug it. After that we start changing parts of the experiment and creating BEAT objects to represent our changes. - -The Problem: What are we doing? -------------------------------- - -Our task will be to discriminate between the 3 types of flowers in Fisher's Iris dataset using LDA. To keep it simple, we will just be discriminating setosa & versicolor flower samples versus virginica flower samples, giving us a 2-class problem. - -Setosa flowers: - -.. image:: ./img/iris_setosa.jpg - -Versicolor flowers: - -.. image:: ./img/iris_versicolor.jpg - -Virginica flowers: - -.. image:: ./img/iris_virginica.jpg - -Each sample in Fisher's Iris dataset is 4 measurements from a flower: - -.. image:: ./img/iris_versicolor_measurements.png - -The dataset therefore looks like the following: - -.. image:: ./img/iris_db_overview.png - -The Goal: What do we want? --------------------------- - -We will get 3 figures of merit from running the experiment: - -* The EER (Equal Error Rate) -* ROC plot -* Scores histogram plot - -To simplify code examples, we will be using `Bob`, a free signal-processing and machine learning toolbox developed by the same group that BEAT is. - -Designing our Experiment outside of BEAT ----------------------------------------- - -Let's design the experiment abstractly, before showing how it is designed out of BEAT objects. - -The first thing we need is the training data for training our LDA machine and the testing data to test the trained model: - -.. image:: ./img/iris_diagram_1.png - -Then we need to train the LDA: - -.. image:: ./img/iris_diagram_2.png - -Then we test the LDA: - -.. image:: ./img/iris_diagram_3.png - -Then we analyze our scores and generate our figures of merit: - -.. image:: ./img/iris_diagram_4.png - -And there is our conceptual experiment diagram! You've probably seen and thought of designs similar to this. Let's see how to split it into BEAT objects. - -The data for BEAT experiments come from `databases`, so we need one of those: - -.. image:: ./img/iris_diagram_database.png - -We need an `algorithm` for training the LDA machine: - -.. image:: ./img/iris_diagram_training.png - -We need another algorithm for testing the model: - -.. image:: ./img/iris_diagram_testing.png - -And we'll need an `analyzer` algorithm for generating our figures of merit: - -.. image:: ./img/iris_diagram_analyzer.png - -In addition to that, we will need a `toolchain` and an `experiment`. This gives us 6 different BEAT objects: - -* Database -* Training algorithm -* Testing algorithm -* Analyzer algorithm -* Toolchain -* Experiment - -Luckily, these are all already created for us! - -Running the Iris LDA Experiment -=============================== - -Now that we understand what we're doing (more or less), let's get our hands dirty running the pre-made Iris LDA experiment. To do so, we'll be using the ``beat`` cli tool from the ``beat.cmdline`` package. - -.. warning:: - Remember to run shell commands in the parent directory of your prefix! - -Let's make sure the experiment we want to run exists: :: - - $ beat exp list - -Make sure the experiment name, ``test/test/iris/1/iris``, is in that list. - -Run the experiment: :: - - $ beat exp run test/test/iris/1/iris - -It should finish successfully and output status remarks ending with a long amount of JSON-like data. Scrolling up to the top of the output, you should see the following: :: - - Index for database iris/1 not found, building it - Index for database iris/1 not found, building it - Running `test/iris_training/1' for block `training_alg' - Start the execution of 'test/iris_training/1' - Running `test/iris_testing/1' for block `testing_alg' - Start the execution of 'test/iris_testing/1' - Running `test/iris_analyzer/1' for block `analyzer' - Start the execution of 'test/iris_analyzer/1' - Results: - { - "eer": 0.019999999552965164, - "roc": { - "data": [ - { - -See all those statements about ``Running `<algorithm name>` for block `<block name>```? Those are letting you know where the execution of the experiment is at. You can see that it first runs the training algorithm, ``test/iris_training/1``, for the ``training_alg`` block. Then it runs the testing algorithm for the testing block, and finally the analyzer algorithm for the analyzer block. The results are printed last, which you can immediately print again by re-running the experiment. - -These block names are the block names from the experiment's toolchain. Our experiment uses the ``test/iris/1`` toolchain, which looks like this: - -.. image:: ./img/iris_toolchain.png - -Now let's generate the ROC and scores distribution plots! Run the following: :: - - $ beat exp plot --show test/test/iris/1/iris - -This plots our experiment's plottable results. The ``--show`` flag tells us to pop up the plots as soon as they are generated, so we don't have to find them in the filesystem to see them. - -You should be seeing two plots, the ROC and scores distribution: - -.. image:: ./img/lda_roc.png - -.. image:: ./img/lda_scores_distribution.png - - -Debugging the Iris LDA Experiment -================================= - -There are a couple things to keep in mind when debugging your experiments: - -#. The BEAT cache system -#. The availability of normal Python debugging strategies - -The Cache System ----------------- - -There is a subfolder in your prefix called ``cache``. This folder holds all the caches for every step of every experiment you have ran - every block, once executed, saves its results in a cache entry. The cache entry for a block is saved in a path made from hashing the metadata & code from that block and every block before it. This means the following: - -* You may re-run an experiment and immediately get its results, since all of the experiment's steps are found in the cache. -* Changing a block in an experiment (whether its a source code file or metadata) will re-run that step and every step that depends on it. -* Changing a database means that all experiments ran using that database cannot use any of their caches. -* Changing an analyzer of an experiment means that only the last step will have to be re-ran. - -.. note:: Please note that the root folder of the database is not a part of information used for producing the hash of the database block. Therefore if the only modification in a database block is changing its root folder, beat will use the previous cach data available. You need to clear the cache manually to make sure that the new root folder is accessable by beat. for more information about the data used for making the hashes of each block please refer to the `hash.py <https://gitlab.idiap.ch/beat/beat.core/blob/master/beat/core/hash.py>`_ from beat.core. - - - -To look at an experiment's caches, use the ``beat exp caches <exp name>`` command. To see our experiment's caches: :: - - $ beat exp caches test/test/iris/1/iris - -To work with caches generally: :: - - $ beat caches - -Using Debugging Strategies --------------------------- - -Let's run a different experiment for this: ``test/test/iris/1/error``. This experiment purposely has a bug in its analyzer algorithm Python file: :: - - Skipping execution of `test/iris_training/1' for block `training_alg' - outputs exist - Skipping execution of `test/iris_testing/1' for block `testing_alg' - outputs exist - Running `test/iris_analyzer_error/1' for block `analyzer' - Start the execution of 'test/iris_analyzer_error/1' - Block did not execute properly - outputs were reset - Standard output: - - Standard error: - - Captured user error: - File "test/iris_analyzer_error/1.py", line 55, in process - far32 = nump.float32(far) - NameError: name 'nump' is not defined - Captured system error: - - -Let's edit that Python file: :: - - $ beat alg edit test/iris_analyzer_error/1 - -And uncomment line 53: :: - - #print(nump) - -.. note:: If you don't want to edit using the ``beat alg edit`` command (e.g., an external editor), you can find the path of the file to edit by using ``beat alg path`` instead of ``beat alg edit`` and copying the file path. - -Save the file, exit the editor, and re-run the experiment again, to get the following message: :: - - ... - Captured user error: - File "test/iris_analyzer_error/1.py", line 53, in process - print(nump) - NameError: name 'nump' is not defined - Captured system error: - -Now we know for sure that "nump" isn't defined. To make very sure, let's inspect it with ``ipdb``. Edit the file again with: :: - - $ beat alg edit test/iris_analyzer_error/1 - -Recomment line 53 and uncomment line 54: :: - - #import ipdb;ipdb.set_trace() - -Run the experiment again to be put into an ipdb prompt when the block executes! You can play around in here like normal, and make sure that "nump" was just a mispelling of "numpy". - - -The Iris Means Experiment -========================= - -Let's start learning how to use ``beat.editor``! We will start off by copying our experiment ``test/test/iris/1/iris`` and changing the LDA algorithm to a more simple means-based algorithm: - -#. In the training step, we will calculate the mean measurement of setosa and versicolor flower samples (averaging all our "negative samples"). -#. In the testing step, we will compare all our samples to this mean sample by calculating the distance from the current sample to the mean sample. - -.. warning:: Make sure the ``beat editor serve`` server is running in a terminal tab in the parent directory of your prefix! - -Creating the Experiment ------------------------ - -In the BEAT editor browser tab, click the "experiments" tab on the top navigation bar. This will show you a list of experiments, including the Iris LDA experiment ``test/test/iris/1/iris``. - -Click the `Clone` button next to the Iris LDA experiment. This will pop up a window for creating your new experiment, with the "User", "Toolchain", and "Name" fields already pre-filled with the values from the cloned Iris LDA experiment. Note that it says "Invalid Name" at the top, because the name is already taken. - -Change the "User" field from "test" to "tutorial" (or your own name), and change the "Name" field from "iris" to "means". Click the `Create` button to create your new experiment! You should see your new Iris Means experiment at the top of the experiment list. - -Click on your Iris Means experiment's name to go into the Experiment Editor for that experiment. We are going to change this copied experiment from using LDA as its algorithm to using our means algorithm. These two algorithms are already created - ``test/means_training/1`` and ``test/means_testing/1``. - -.. note:: Please see the "Experiment Editor" documentation section for information on what constitutes the editor. - -In the toolchain window, click on the `training_alg` block to open the block editor for that block. Under the toolchain window, change the selected algorithm for the `training_alg` block under the "Algorithm" select box from ``test/iris_training/1`` to ``test/means_training/1``. - -In the toolchain window, click on the `testing_alg` block to open the block editor for that block. Under the toolchain window, change the selected algorithm for the `testing_alg` block under the "Algorithm" select box from ``test/iris_testing/1`` to ``test/means_testing/1``. - -Click the `Save Changes` button at the top to save your changes to your Iris Means experiment. We have finished our editing - now to run it! - -Running & Analyzing the Experiment ----------------------------------- - -Back in your terminal, check to make sure the new experiment is in your prefix: :: - - $ beat exp list - -You should see your experiment (``tutorial/test/iris/1/means``). Now run it like we did earlier: :: - - $ beat exp run tutorial/test/iris/1/means - -You should see output similar to the output you saw for the Iris LDA experiment. Let's plot the ROC and scores distribution: :: - - $ beat exp plot --show test/test/iris/1/iris - -You should see the following two plots: - -.. image:: ./img/means_roc.png - -.. image:: ./img/means_scores_distribution.png - -The Advanced Iris Experiment -============================ - -Many experiments will use a preprocessing step. Right now, our Iris experiments don't have one. Let's add one! - -Our preprocessing algorithm will be very simple (and not particularly useful) - round each measurement in a sample to the nearest whole number. - -The dataset, currently: - -.. image:: ./img/iris_db_before.png - -The dataset, after being preprocessed: - -.. image:: ./img/iris_db_after.png - -Everything else will be the same as the Iris LDA experiment. - -Changing the Toolchain ----------------------- - -The first thing we need to do is make a toolchain for this experiment with blocks for the preprocessor. We copy our Iris toolchain, ``test/iris/1``: - -.. image:: ./img/iris_toolchain.png - -And edit our copy to look like: - -.. image:: ./img/iris_advanced_toolchain.png - -Go to your ``beat.editor`` browser tab. Click on the `toolchains` button at the top to go to your toolchains list. You should see the ``test/iris/1`` toolchain and a couple others. - -Do the same we did for the experiment: click the `Clone` button and change the "User" & "Name" fields from "test" and "iris" to "tutorial" and "iris_advanced", respectively. Click `Create` to save the toolchain, and click on the new toolchain's name to open up the Toolchain Editor for this object. - -.. note:: Please see the "Toolchain Editor" documentation section for information on what constitutes the editor. - -Now we will add the preprocessor blocks to the toolchain. Do the following steps: - -#. Scroll down to fit the toolchain window editor in your browser window. Make sure that you can see the toolchain blocks. If you can't, click on the `Fit` button in the top-right menu in the editor to fit the blocks to your screen. -#. We need to make space for the new blocks: Shift over the last three blocks (`training_alg`, `testing_alg`, and `analyzer`) to make space for the preprocessor blocks. Left-click and drag to area-select all three of these blocks. Left-click and drag on the grey rectangle encompassing these three blocks to the right to move your selection to the right. Click anywhere outside of the grey rectangle to deselect the blocks. -#. We need to remove the connections that need the preprocessing step: Right click on the connection going from `training_data`'s "measurements" output to `training_alg`'s "measurements" input. Select `Delete Connection` to remove the connection. Do the same for the other "measurements" connection for the testing data. -#. We need to create the preprocessor blocks: Right-click between the `training_data` and `training_alg` blocks to open the block insertion menu. Click `Create Block Here` to insert a new block at your cursor. Add another block between the `testing_data` and `testing_alg` blocks. -#. We need to edit the new blocks to change their names and add inputs & outputs: Click on the new block between the "training" blocks to open the Block Editor modal. Change the name from "block" to "pre_training". Click `Add Input` and `Add Output` - rename both of these to "measurements". Click the "X" button in the top right, press the "Esc" button on your keyboard, or click out of the modal to close the editor. Do the same for the other new block, but name it "pre_testing". -#. Now that we have our preprocessor blocks, we need to connect the blocks to the rest of the toolchain: Click and drag on the black square next to the "measurements" output of the `training_data` block to start creating a connection. Drag and release this connection on the black square next to `pre_training`'s "measurements" input. Create another connection from `pre_training`'s "measurements" output to `training_alg`'s "measurements" input. Do the same for the test data preprocessor block. -#. Make sure the synchronization channels are properly set: The `pre_training` block should be green (synchronized to the `training_data` channel), and the `pre_testing` block should be pink (synchronized to the `testing_data` channel). If not, click on the blocks and change the "Channel" to the appropriate channel. -#. You should have the same toolchain as shown above! Now we can create our preprocessor algorithm and use it in a experiment. - -Creating the Preprocessor Algorithm ------------------------------------ - -We need an algorithm to fill in our preprocessors blocks in our toolchain. Let's create a new algorithm. - -Click on `algorithms` at the top of your ``beat.editor`` browser tab to go to the list of your algorithms in your prefix. You should see the algorithms you've been using as well as others. - -Click the `New` button in the top right of the list to pop up the modal for creating objects. Fill in the "User" and "Name" fields with "tutorial" and "iris_preprocessor". - -Click `Create` to create the new algorithm, ``tutorial/iris_preprocessor/1``, at the top of your algorithms list. Click on the new algorithm's name to enter the Algorithm Editor for this algorithm. - -.. note:: Please see the "Algorithm Editor" documentation section for information on what constitutes the editor. - -Our preprocessor algorithm takes in a sample (set of 4 measurements), rounds the measurements, and returns the rounded sample. This means that the algorithm will have 1 group with 1 input and 1 output. Do the following steps: - -#. Click the `New Group` button at the bottom of the page to add a new group, if one doesn't already exist. -#. Click the `New Input` and `New Output` buttons if the group doesn't already have any. -#. Rename the input and the output to "measurements". -#. The algorithm metadata is complete! Click `Save Changes` at the top to save the new algorithm. Click the `Generate Python File` to generate a basic python file for this algorithm based off a template. -#. Back in the shell, run the following to start editing this python file: :: - - $ beat alg edit tutorial/iris_advanced/1 - -#. Add the following code to the end of the ``process`` method, before the return statement: :: - - # get the array of measurements - measurements = inputs['measurements'].data.value - # round each measurement - processed = [round(m) for m in measurements] - # output the new array - outputs['measurements'].write({ - 'value': processed - }) - -#. Save and exit your editor. Now the algorithm is done! Time to create the experiment and see how useful this preprocessor is. - -Creating the Iris Advanced Experiment -------------------------------------- - -Click the `experiments` tab to go to your experiments list. Click `New` to create a new experiment: - -#. Fill in the "Name" field with "tutorial" -#. Select your new toolchain (``tutorial/iris_advanced/1``) in the "Toolchain" field -#. Fill in the "Name" field with "iris" -#. Click `Create` to create the experiment - -Click the new experiment's name to enter the Experiment Editor for this experiment. - -Before, we just changed an experiment. Now, we have to fill in an entire experiment from scratch. This means that we have to assign the correct dataset & algorithm for each block in the toolchain. - -Refer to the Iris LDA experiment for the correct datasets & algorithms for all blocks except the preprocessor blocks, `pre_training` and `pre_testing`. For these two, choose the algorithm we just created, ``tutorial/iris_preprocessor/1``. - -Now save the experiment, and run it! Back in the terminal, check to make sure the experiment exists: :: - - $ beat exp list - -Run the experiment: :: - - $ beat exp run tutorial/tutorial/iris_advanced/1/iris - -Plot the experiment: :: - - $ beat exp plot --show tutorial/tutorial/iris_advanced/1/iris - -You should see the following two diagrams: - -.. image:: ./img/adv_roc.png - -.. image:: ./img/adv_scores_distribution.png - - -.. automodule:: beat.editor