Skip to content

BUG: Fix json_normalize throwing TypeError (#21536) #21540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 22, 2018

Conversation

vuminhle
Copy link
Contributor

@vuminhle vuminhle commented Jun 19, 2018

Fix json_normalize throwing TypeError with array of values and record_prefix (#21536)

@vuminhle vuminhle force-pushed the fix-json-normalize-type-error branch from 035af20 to cc4d5b2 Compare June 19, 2018 11:17
@codecov
Copy link

codecov bot commented Jun 19, 2018

Codecov Report

Merging #21540 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21540      +/-   ##
==========================================
- Coverage   91.91%    91.9%   -0.02%     
==========================================
  Files         153      153              
  Lines       49546    49549       +3     
==========================================
- Hits        45542    45539       -3     
- Misses       4004     4010       +6
Flag Coverage Δ
#multiple 90.3% <100%> (-0.02%) ⬇️
#single 41.78% <0%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/io/json/normalize.py 96.87% <100%> (ø) ⬆️
pandas/util/testing.py 85.27% <0%> (-0.7%) ⬇️
pandas/core/arrays/categorical.py 95.63% <0%> (-0.06%) ⬇️
pandas/core/indexing.py 93.37% <0%> (-0.05%) ⬇️
pandas/tseries/offsets.py 97.16% <0%> (-0.04%) ⬇️
pandas/core/indexes/multi.py 94.97% <0%> (-0.01%) ⬇️
pandas/core/indexes/datetimes.py 95.66% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.24% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.62% <0%> (ø) ⬆️
pandas/core/sorting.py 98.2% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fbb683...55e1022. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment can you add a note in 0.23.2, bug fixes in io section.

@@ -123,6 +123,12 @@ def test_simple_normalize_with_separator(self, deep_nested):
'country', 'states_name']).sort_values()
assert result.columns.sort_values().equals(expected)

def test_value_array_record_prefix(self):
#GH 21536
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after the #

@@ -259,7 +259,8 @@ def _recursive_extract(data, path, seen_meta, level=0):
result = DataFrame(records)

if record_prefix is not None:
result.rename(columns=lambda x: record_prefix + x, inplace=True)
result.rename(columns=lambda x: "{p}{c}".format(p=record_prefix, c=x),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change to

result = result.rename(...)

we don't use inplace in library code

@jreback jreback added Bug IO JSON read_json, to_json, json_normalize labels Jun 19, 2018
@vuminhle
Copy link
Contributor Author

@jreback : How do I add a note? Thanks.

@WillAyd
Copy link
Member

WillAyd commented Jun 19, 2018

@vuminhle you can add a note to doc/source/whatsnew/v0.23.2.txt. I think this is fine under "Bug Fixes" -> "Other"

@jreback
Copy link
Contributor

jreback commented Jun 19, 2018

should be in IO

@vuminhle
Copy link
Contributor Author

@WillAyd @jreback I added a note. Thanks for the instruction.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the docstring for this function and add an example of what it does?

# GH 21536
result = json_normalize({'A': [1, 2]}, 'A', record_prefix='Prefix.')
expected = DataFrame([[1], [2]], columns=['Prefix.0'])
tm.assert_frame_equal(result.reindex_like(expected), expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think reindex_like is required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Don't need it. I followed the code in test_simple_normalize_with_separator.
Btw, do we need docstring for test functions? None of the functions in this file has docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope not for test functions. I was referring to the docstring for json_normalize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Done. Thanks.

@@ -65,7 +65,7 @@ Bug Fixes
**I/O**

- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`)
-
- Bug in :func:`json_normalize` where passing an array of values and a `record_prefix` would raise a `TypeError` (:issue:`21536`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks on record_prefix and TypeError

Copy link
Contributor

@jreback jreback Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the issue that this is for non-string columns? say this instead, the fact that it raised the TypeError is the bug here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to double backticks.
The issue is not because the column is non-string; it is because the array contains values (i.e., not objects or arrays). Suggestions are welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how so? before concatting integers and strings would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood. When you mentioned columns, I thought they were the data columns, not the column names. Yes. The bug is caused by concatenating non-string column names, which happens when we pass an array of values (integers, strings, etc.). I can't think of other paths to trigger the bug.

My note explains how one can reproduce the bug. Looks like you want it to describe the reason.
I'm fine with either. How about this?

  • Bug in :func:json_normalize that caused it to incorrectly raise an error when concatenating non-string columns (:issue:21536)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closer, though a user won't know what concatenating is (in this context), maybe just say something like

bug in formatting the record_prefix with integer columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Bug in :func:json_normalize when formatting the record_prefix with integer columns (:issue:21536)

@@ -65,7 +65,7 @@ Bug Fixes
**I/O**

- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`)
-
- Bug in :func:`json_normalize` where passing an array of values and a `record_prefix` would raise a `TypeError` (:issue:`21536`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closer, though a user won't know what concatenating is (in this context), maybe just say something like

bug in formatting the record_prefix with integer columns

@jreback jreback added this to the 0.23.2 milestone Jun 22, 2018
@jreback jreback merged commit 5fdaa97 into pandas-dev:master Jun 22, 2018
@jreback
Copy link
Contributor

jreback commented Jun 22, 2018

thanks @vuminhle

jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json_normalize throws TypeError with array of values and record_prefix
4 participants