Fixed issue with --directory flag in evaluate.py

12 jobs for evaluate_directory in 16 minutes and 9 seconds (queued for 6 seconds)