Most inefficiencies were coming from that fact that we were creating a dask array with each sample as a separate chunk.