[engine.*trainer] Set non-blocking operation for CPU->GPU data transfers to make communication asynchronous