Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
bob.pipelines
bob.pipelines
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • bob
  • bob.pipelinesbob.pipelines
  • Issues
  • #18

Closed
Open
Opened May 29, 2020 by Tiago de Freitas Pereira@tiago.pereira
  • Report abuse
  • New issue
Report abuse New issue

SampleBatch design issues

Hi,

Although SampleBatch brings convenience and efficiency, it forces us to develop transformers that are compatible with it.

Imagine the simple transformer bellow:

class FakeTransformer(TransformerMixin, BaseEstimator):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X + 1

    def _more_tags(self):
        return {"stateless": True, "requires_fit": False}

I can easily use it with numpy arrays as input.

    transformer = FakeTransformer()
    X = np.zeros(shape=(3, 160, 160))    
    transformed_X = transformer.transform(X)

However, I run into problems once I wrap it as a sample

    sample = Sample(X)
    transformer_sample = wrap(["sample"], transformer)
    my_beautiful_sample = [s.data for s in transformer_sample.transform([sample])]
    # THIS DOESN'T WORK

With this wrap, the input X of FakeTransformer.transform will be SampleBatch and not numpy array. Hence, I can't do X+1.

I can approach this issue in my transformer by doing this:

    def transform(self, X):
        X = np.asarray(X)
        return X + 1

However, this is a blocker if we want to use estimators developed by other people outside of our circle.

Do you think it is sensible to have X wrapped as a SampleBatch once SampleTransform is used? It breaks encapsulation.

Thanks

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: bob/bob.pipelines#18