ENH: Preserve Series index on `json_normalize` #57422

nworb-cire · 2024-02-14T18:06:34Z

closes pd.json_normalize doesn't return data with index from series #51452
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Passing a Series as input to json_normalize was already functional before but undocumented as a feature, and had the issue linked above where it would not preserve the index of the input series.

mroeschke · 2024-02-14T19:54:10Z

pandas/tests/io/json/test_normalize.py

@@ -893,4 +901,5 @@ def test_series_non_zero_index(self):
                "elements.c": [np.nan, np.nan, 3.0],
            }
        )
+        expected.index = [1, 2, 3]


Nit: Could you put this in the expected = DataFrame(... call above?

mroeschke · 2024-02-14T19:55:21Z

pandas/tests/io/json/test_normalize.py

+        idx = [7, 8]
+        series = Series(state_data, index=idx)
+        result = json_normalize(series)
+        assert (result.index == idx).all()


Could you make idx a pandas.Index and then use

tm.assert_index_equal(result.index, idx)

mroeschke · 2024-02-14T19:55:30Z

pandas/tests/io/json/test_normalize.py

+        result = json_normalize(series)
+        assert (result.index == idx).all()
+        result = json_normalize(series, "counties")
+        assert (result.index == np.array(idx).repeat([3, 2])).all()


Similar comment to the one above

mroeschke · 2024-02-14T19:56:54Z

doc/source/whatsnew/v3.0.0.rst

@@ -31,6 +31,7 @@ Other enhancements
 - :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
 - :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
 - Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via ``pat`` parameter (:issue:`51748`)
+- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`)


Suggested change

- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`)

- Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)

mroeschke

Could you also update the json_normalize docstring data argument and an example?

mroeschke · 2024-02-14T21:13:02Z

Thanks @nworb-cire

JMBurley · 2024-02-14T21:28:05Z

Hi @mroeschke @nworb-cire, I just saw this via email that the original issue had been closed. (@nworb-cire thanks for the fix!).

The code looks valid and follows the principle-of-least-surprise that the index is retained when json_normalize input is a series.

However, that is a breaking change on existing pandas code in the wild that might rely on json_normalise always returning a fresh 0:N index.

I think the patch notes should alert to the change in behaviour more strongly than it currently does.

@mroeschke I assume that the breaking change is okay as part of pandas 3.0? If pandas guidelines are not okay with breaking changes we could patch in an extra arg for the function such that the default behaviour is the former (weird) behaviour but users can toggle on the new functionality.

mroeschke · 2024-02-14T21:44:56Z

I assume that the breaking change is okay as part of pandas 3.0?

Correct. I am not sure if adding an extra argument to get the old behavior is entirely needed as a user can get the old behavior by calling reset_index before using json_normalize, but a doc example showing that would be welcome.

JMBurley · 2024-02-14T22:27:32Z

I am not sure if adding an extra argument to get the old behavior is entirely needed as a user can get the old behavior by calling reset_index before using json_normalize,

The only reason to add the extra arg would be to make the code backwards compatible. If we are happy to make a breaking change then there is no need for it.

* Preserve index on json_normalize * Update unit tests * Update release notes * Pass linter * Set index in constructor * Use tm assert_index_equal in unit test * Update docstring and examples * Update release notes

* note breaking change in json_normalize retaining index For context: #51542 & #57422 * Update doc/source/whatsnew/v3.0.0.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Matthew Roeschke <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

nworb-cire added 4 commits February 14, 2024 09:25

Preserve index on json_normalize

ecf941b

Update unit tests

f5cc0f6

Update release notes

ec6aef2

Pass linter

decdc23

mroeschke reviewed Feb 14, 2024

View reviewed changes

mroeschke added the IO JSON read_json, to_json, json_normalize label Feb 14, 2024

nworb-cire added 4 commits February 14, 2024 13:02

Set index in constructor

a326898

Use tm assert_index_equal in unit test

8461a3d

Update docstring and examples

5ad9bf6

Update release notes

12d7178

mroeschke approved these changes Feb 14, 2024

View reviewed changes

mroeschke added this to the 3.0 milestone Feb 14, 2024

mroeschke merged commit 81e3e0f into pandas-dev:main Feb 14, 2024

nworb-cire deleted the json_normalize branch February 14, 2024 21:13

JMBurley mentioned this pull request Jun 27, 2024

DOC: json_normalize breaking changes in pandas 3.0.0 #59127

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Preserve Series index on `json_normalize` #57422

ENH: Preserve Series index on `json_normalize` #57422

nworb-cire commented Feb 14, 2024

mroeschke Feb 14, 2024

mroeschke Feb 14, 2024

mroeschke Feb 14, 2024

mroeschke Feb 14, 2024 •

edited

Loading

mroeschke left a comment

mroeschke commented Feb 14, 2024

JMBurley commented Feb 14, 2024 •

edited

Loading

mroeschke commented Feb 14, 2024

JMBurley commented Feb 14, 2024

	- Support passing a ``Series`` input to :func:`normalize_json` (:issue:`51452`)
	- Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)

ENH: Preserve Series index on json_normalize #57422

ENH: Preserve Series index on json_normalize #57422

Conversation

nworb-cire commented Feb 14, 2024

mroeschke Feb 14, 2024

Choose a reason for hiding this comment

mroeschke Feb 14, 2024

Choose a reason for hiding this comment

mroeschke Feb 14, 2024

Choose a reason for hiding this comment

mroeschke Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke commented Feb 14, 2024

JMBurley commented Feb 14, 2024 • edited Loading

mroeschke commented Feb 14, 2024

JMBurley commented Feb 14, 2024

ENH: Preserve Series index on `json_normalize` #57422

ENH: Preserve Series index on `json_normalize` #57422

mroeschke Feb 14, 2024 •

edited

Loading

JMBurley commented Feb 14, 2024 •

edited

Loading