Prefetch inefficiency

The way we do prefetch is quite inefficient, basically because of the GIL.

If someone is interested, let's observe this thread here (or even collaborate with it)

https://github.com/tensorflow/tensorflow/issues/7951

@pkorshunov @heusch @amohammadi