-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Preserve Series index on json_normalize
#57422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -893,4 +901,5 @@ def test_series_non_zero_index(self): | |||
"elements.c": [np.nan, np.nan, 3.0], | |||
} | |||
) | |||
expected.index = [1, 2, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you put this in the expected = DataFrame(...
call above?
idx = [7, 8] | ||
series = Series(state_data, index=idx) | ||
result = json_normalize(series) | ||
assert (result.index == idx).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make idx
a pandas.Index
and then use
tm.assert_index_equal(result.index, idx)
result = json_normalize(series) | ||
assert (result.index == idx).all() | ||
result = json_normalize(series, "counties") | ||
assert (result.index == np.array(idx).repeat([3, 2])).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to the one above
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -31,6 +31,7 @@ Other enhancements | |||
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`) | |||
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`) | |||
- Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via ``pat`` parameter (:issue:`51748`) | |||
- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`) | |
- Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update the json_normalize
docstring data
argument and an example?
Thanks @nworb-cire |
Hi @mroeschke @nworb-cire, I just saw this via email that the original issue had been closed. (@nworb-cire thanks for the fix!). The code looks valid and follows the principle-of-least-surprise that the index is retained when However, that is a breaking change on existing pandas code in the wild that might rely on json_normalise always returning a fresh 0:N index. I think the patch notes should alert to the change in behaviour more strongly than it currently does. @mroeschke I assume that the breaking change is okay as part of pandas 3.0? If pandas guidelines are not okay with breaking changes we could patch in an extra arg for the function such that the default behaviour is the former (weird) behaviour but users can toggle on the new functionality. |
Correct. I am not sure if adding an extra argument to get the old behavior is entirely needed as a user can get the old behavior by calling |
The only reason to add the extra arg would be to make the code backwards compatible. If we are happy to make a breaking change then there is no need for it. |
* Preserve index on json_normalize * Update unit tests * Update release notes * Pass linter * Set index in constructor * Use tm assert_index_equal in unit test * Update docstring and examples * Update release notes
* note breaking change in json_normalize retaining index For context: #51542 & #57422 * Update doc/source/whatsnew/v3.0.0.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Matthew Roeschke <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Passing a Series as input to
json_normalize
was already functional before but undocumented as a feature, and had the issue linked above where it would not preserve the index of the input series.