This MR fixes and improves the evaluation routines in these ways:
Be more thorough with information saved after evaluation - we now save textual representations of all performance and histogram figures together with the evaluation results in the JSON file
Refactor plot generation - we disentangled histogram and plot creation completely. Now generating the plots is optional and not anymore associated with the run_binary() function.
Add option to not plot in the evaluate script
Add option to control binning in the evaluate (relates to issue #64 (closed))
Solves binning issue with large datasets (closes #64 (closed))