Skip to content

Fix for DataFrames with MultiIndex columns #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 15, 2018

Conversation

kristofve
Copy link
Contributor

Calling fit_transform() on a DataFrameMapper for DataFrames with a multi-level column index often throws the following error:
TypeError: sequence item 0: expected str instance, tuple found
I fixed this by mapping the column-name tuples to str. This change also fixes the related #143.

Along with this change I have created a test and some fixtures to work with MultiIndex-column DataFrames. Tox tests pass for all supplied virtualenv configurations, but fail in an unrelated place for the latest pandas version (I believe this is being fixed elsewhere).

A minor issue that still remains: the column names of the transformed DataFrame are no longer the same as in the original DataFrame because of the tuple to string conversion. The error fix had a higher priority in my own use case, but I am still thinking of a way in which the names are kept the same in cases where it's possible without breaking other cases (i.e. a simple eval won't cut it).

@dukebody dukebody merged commit 7fdc39a into scikit-learn-contrib:master Aug 15, 2018
@dukebody
Copy link
Collaborator

Thanks @kristofve91 !

@kristofve kristofve deleted the multiindex-fix branch August 20, 2018 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants