Skip to content

ENH: Preserve Series index on json_normalize #57422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 14, 2024

Conversation

nworb-cire
Copy link
Contributor

Passing a Series as input to json_normalize was already functional before but undocumented as a feature, and had the issue linked above where it would not preserve the index of the input series.

@@ -893,4 +901,5 @@ def test_series_non_zero_index(self):
"elements.c": [np.nan, np.nan, 3.0],
}
)
expected.index = [1, 2, 3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you put this in the expected = DataFrame(... call above?

idx = [7, 8]
series = Series(state_data, index=idx)
result = json_normalize(series)
assert (result.index == idx).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make idx a pandas.Index and then use

tm.assert_index_equal(result.index, idx)

result = json_normalize(series)
assert (result.index == idx).all()
result = json_normalize(series, "counties")
assert (result.index == np.array(idx).repeat([3, 2])).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the one above

@@ -31,6 +31,7 @@ Other enhancements
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
- Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via ``pat`` parameter (:issue:`51748`)
- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`)
Copy link
Member

@mroeschke mroeschke Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`)
- Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the json_normalize docstring data argument and an example?

@mroeschke mroeschke added the IO JSON read_json, to_json, json_normalize label Feb 14, 2024
@mroeschke mroeschke added this to the 3.0 milestone Feb 14, 2024
@mroeschke mroeschke merged commit 81e3e0f into pandas-dev:main Feb 14, 2024
@mroeschke
Copy link
Member

Thanks @nworb-cire

@nworb-cire nworb-cire deleted the json_normalize branch February 14, 2024 21:13
@JMBurley
Copy link
Contributor

JMBurley commented Feb 14, 2024

Hi @mroeschke @nworb-cire, I just saw this via email that the original issue had been closed. (@nworb-cire thanks for the fix!).

The code looks valid and follows the principle-of-least-surprise that the index is retained when json_normalize input is a series.

However, that is a breaking change on existing pandas code in the wild that might rely on json_normalise always returning a fresh 0:N index.

I think the patch notes should alert to the change in behaviour more strongly than it currently does.

@mroeschke I assume that the breaking change is okay as part of pandas 3.0? If pandas guidelines are not okay with breaking changes we could patch in an extra arg for the function such that the default behaviour is the former (weird) behaviour but users can toggle on the new functionality.

@mroeschke
Copy link
Member

I assume that the breaking change is okay as part of pandas 3.0?

Correct. I am not sure if adding an extra argument to get the old behavior is entirely needed as a user can get the old behavior by calling reset_index before using json_normalize, but a doc example showing that would be welcome.

@JMBurley
Copy link
Contributor

I am not sure if adding an extra argument to get the old behavior is entirely needed as a user can get the old behavior by calling reset_index before using json_normalize,

The only reason to add the extra arg would be to make the code backwards compatible. If we are happy to make a breaking change then there is no need for it.

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* Preserve index on json_normalize

* Update unit tests

* Update release notes

* Pass linter

* Set index in constructor

* Use tm assert_index_equal in unit test

* Update docstring and examples

* Update release notes
mroeschke added a commit that referenced this pull request Jun 27, 2024
* note breaking change in json_normalize retaining index

For context: #51542 & #57422

* Update doc/source/whatsnew/v3.0.0.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.json_normalize doesn't return data with index from series
3 participants