-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix json_normalize throwing TypeError (#21536) #21540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix json_normalize throwing TypeError (#21536) #21540
Conversation
035af20
to
cc4d5b2
Compare
Codecov Report
@@ Coverage Diff @@
## master #21540 +/- ##
==========================================
- Coverage 91.91% 91.9% -0.02%
==========================================
Files 153 153
Lines 49546 49549 +3
==========================================
- Hits 45542 45539 -3
- Misses 4004 4010 +6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment can you add a note in 0.23.2, bug fixes in io section.
@@ -123,6 +123,12 @@ def test_simple_normalize_with_separator(self, deep_nested): | |||
'country', 'states_name']).sort_values() | |||
assert result.columns.sort_values().equals(expected) | |||
|
|||
def test_value_array_record_prefix(self): | |||
#GH 21536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after the #
pandas/io/json/normalize.py
Outdated
@@ -259,7 +259,8 @@ def _recursive_extract(data, path, seen_meta, level=0): | |||
result = DataFrame(records) | |||
|
|||
if record_prefix is not None: | |||
result.rename(columns=lambda x: record_prefix + x, inplace=True) | |||
result.rename(columns=lambda x: "{p}{c}".format(p=record_prefix, c=x), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change to
result = result.rename(...)
we don't use inplace
in library code
@jreback : How do I add a note? Thanks. |
@vuminhle you can add a note to |
should be in IO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the docstring for this function and add an example of what it does?
# GH 21536 | ||
result = json_normalize({'A': [1, 2]}, 'A', record_prefix='Prefix.') | ||
expected = DataFrame([[1], [2]], columns=['Prefix.0']) | ||
tm.assert_frame_equal(result.reindex_like(expected), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think reindex_like
is required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Don't need it. I followed the code in test_simple_normalize_with_separator
.
Btw, do we need docstring for test functions? None of the functions in this file has docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope not for test functions. I was referring to the docstring for json_normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. Done. Thanks.
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -65,7 +65,7 @@ Bug Fixes | |||
**I/O** | |||
|
|||
- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`) | |||
- | |||
- Bug in :func:`json_normalize` where passing an array of values and a `record_prefix` would raise a `TypeError` (:issue:`21536`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double backticks on record_prefix
and TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the issue that this is for non-string columns? say this instead, the fact that it raised the TypeError
is the bug here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change to double backticks.
The issue is not because the column is non-string; it is because the array contains values (i.e., not objects or arrays). Suggestions are welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how so? before concatting integers and strings would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood. When you mentioned columns, I thought they were the data columns, not the column names. Yes. The bug is caused by concatenating non-string column names, which happens when we pass an array of values (integers, strings, etc.). I can't think of other paths to trigger the bug.
My note explains how one can reproduce the bug. Looks like you want it to describe the reason.
I'm fine with either. How about this?
- Bug in :func:
json_normalize
that caused it to incorrectly raise an error when concatenating non-string columns (:issue:21536
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closer, though a user won't know what concatenating is (in this context), maybe just say something like
bug in formatting the record_prefix
with integer columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Bug in :func:
json_normalize
when formatting therecord_prefix
with integer columns (:issue:21536
)
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -65,7 +65,7 @@ Bug Fixes | |||
**I/O** | |||
|
|||
- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`) | |||
- | |||
- Bug in :func:`json_normalize` where passing an array of values and a `record_prefix` would raise a `TypeError` (:issue:`21536`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closer, though a user won't know what concatenating is (in this context), maybe just say something like
bug in formatting the record_prefix
with integer columns
thanks @vuminhle |
Fix json_normalize throwing TypeError with array of values and record_prefix (#21536)
TypeError
with array of values andrecord_prefix
#21536git diff upstream/master -u -- "*.py" | flake8 --diff