Closed
Description
Hi folks:
An issue I have with scikit-learn's FeatureUnion
is that you can't make it return a DataFrame
(unlike with regular transformers). It would be nice to see a variant of FeatureUnion
that worked nicely with Pandas workflows.
Here's a prototype of what I'm thinking:
class PandasFeatureUnion(FeatureUnion):
"""FeatureUnion which returns a DataFrame."""
def _to_dataframe(self, X):
columns = [name for (name, _) in self.transformer_list]
return pd.DataFrame(X, columns=columns)
def transform(self, X):
result = super().transform(X)
return self._to_dataframe(result)
def fit_transform(self, X):
result = super().fit_transform(X)
return self._to_dataframe(result)
You could imagine using it in a manner similar to this example. I see this as a complement to the existing DataFrameMapper
.
My example above doesn't handle indexes yet and I'd love some advice on how to implement it (ideally without having to rewrite most of FeatureUnion
. Here are some concerns I have now:
- Should all the indexes have to match up?
- If not, how should joins be handled?
- Should
PandasFeatureUnion
accept anignore_index=True
argument?
Metadata
Metadata
Assignees
Labels
No labels