Skip to content

Adding tags to transformers to differentiate between annotators, preprocessors, and extractors.

I think while it was a good idea to use scikit-learn's transformers API for our classes, still there are some differences between the transformers that we implement. I suggest adding tags to our transformers (https://scikit-learn.org/stable/developers/develop.html#estimator-tags) to be able to programmatically differentiate them. For example, we can have:

class Preprocessor(BaseEstimator):

    def _more_tags(self):
        return {'bob_transformer': 'preprocessor'}

that would allow:

preprocessor = wrap(["sample"], preprocessor)

to implicitly imply:

transform_extra_arguments = (("annotations", "annotations"),)
preprocessor = wrap(["sample"], preprocessor, transform_extra_arguments=transform_extra_arguments)

Or wrapping an annotator would imply sample.annotations = annotator(sample.data) instead of the usual sample.data = transformer(sample.data).

What do you think? Does it make sense?

Edited by Amir MOHAMMADI