[script.evaluate] Fix second-annotator comparisons with a key-resemblance check; Add assertions in engine.evaluator to ensure proper comparisons