Skip to content

Feature Request: PandasFeatureUnion #69

Closed
@mattayes

Description

@mattayes

Hi folks:

An issue I have with scikit-learn's FeatureUnion is that you can't make it return a DataFrame (unlike with regular transformers). It would be nice to see a variant of FeatureUnion that worked nicely with Pandas workflows.

Here's a prototype of what I'm thinking:

class PandasFeatureUnion(FeatureUnion):
    """FeatureUnion which returns a DataFrame."""
    
    def _to_dataframe(self, X):
        columns = [name for (name, _) in self.transformer_list]
        return pd.DataFrame(X, columns=columns)
    
    def transform(self, X):
        result = super().transform(X)
        return self._to_dataframe(result)
    
    def fit_transform(self, X):
        result = super().fit_transform(X)
        return self._to_dataframe(result)

You could imagine using it in a manner similar to this example. I see this as a complement to the existing DataFrameMapper.

My example above doesn't handle indexes yet and I'd love some advice on how to implement it (ideally without having to rewrite most of FeatureUnion. Here are some concerns I have now:

  • Should all the indexes have to match up?
  • If not, how should joins be handled?
  • Should PandasFeatureUnion accept an ignore_index=True argument?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions