SampleBatch design issues
Hi,
Although SampleBatch
brings convenience and efficiency, it forces us to develop transformers that are compatible with it.
Imagine the simple transformer bellow:
class FakeTransformer(TransformerMixin, BaseEstimator):
def fit(self, X, y=None):
return self
def transform(self, X):
return X + 1
def _more_tags(self):
return {"stateless": True, "requires_fit": False}
I can easily use it with numpy arrays as input.
transformer = FakeTransformer()
X = np.zeros(shape=(3, 160, 160))
transformed_X = transformer.transform(X)
However, I run into problems once I wrap it as a sample
sample = Sample(X)
transformer_sample = wrap(["sample"], transformer)
my_beautiful_sample = [s.data for s in transformer_sample.transform([sample])]
# THIS DOESN'T WORK
With this wrap, the input X
of FakeTransformer.transform
will be SampleBatch
and not numpy array.
Hence, I can't do X+1
.
I can approach this issue in my transformer by doing this:
def transform(self, X):
X = np.asarray(X)
return X + 1
However, this is a blocker if we want to use estimators developed by other people outside of our circle.
Do you think it is sensible to have X
wrapped as a SampleBatch
once SampleTransform is used?
It breaks encapsulation.
Thanks