Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • bob.pipelines bob.pipelines
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bobbob
  • bob.pipelinesbob.pipelines
  • Issues
  • #18
Closed
Open
Issue created May 29, 2020 by Tiago de Freitas Pereira@tiago.pereiraOwner

SampleBatch design issues

Hi,

Although SampleBatch brings convenience and efficiency, it forces us to develop transformers that are compatible with it.

Imagine the simple transformer bellow:

class FakeTransformer(TransformerMixin, BaseEstimator):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X + 1

    def _more_tags(self):
        return {"stateless": True, "requires_fit": False}

I can easily use it with numpy arrays as input.

    transformer = FakeTransformer()
    X = np.zeros(shape=(3, 160, 160))    
    transformed_X = transformer.transform(X)

However, I run into problems once I wrap it as a sample

    sample = Sample(X)
    transformer_sample = wrap(["sample"], transformer)
    my_beautiful_sample = [s.data for s in transformer_sample.transform([sample])]
    # THIS DOESN'T WORK

With this wrap, the input X of FakeTransformer.transform will be SampleBatch and not numpy array. Hence, I can't do X+1.

I can approach this issue in my transformer by doing this:

    def transform(self, X):
        X = np.asarray(X)
        return X + 1

However, this is a blocker if we want to use estimators developed by other people outside of our circle.

Do you think it is sensible to have X wrapped as a SampleBatch once SampleTransform is used? It breaks encapsulation.

Thanks

Assignee
Assign to
Time tracking