BUG: json_normalize prefixed multi-char sep to all keys (#43851)

debnathshoham · web-flow · commit f9e4394d5435 · 2021-10-03T09:50:20.000-04:00
diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst
@@ -463,6 +463,7 @@ I/O
 - Bug in unpickling a :class:`Index` with object dtype incorrectly inferring numeric dtypes (:issue:`43188`)
 - Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raising uncontrolled ``IndexError`` (:issue:`43102`)
 - Bug in :func:`read_csv`, changed exception class when expecting a file path name or file-like object from ``OSError`` to ``TypeError`` (:issue:`43366`)
+- Bug in :func:`json_normalize` where multi-character ``sep`` parameter is incorrectly prefixed to every key (:issue:`43831`)
 - Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`)
 -
 
diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py
@@ -148,8 +148,9 @@ def _normalise_json(
             _normalise_json(
                 data=value,
                 # to avoid adding the separator to the start of every key
+                # GH#43831 avoid adding key if key_string blank
                 key_string=new_key
-                if new_key[len(separator) - 1] != separator
+                if new_key[: len(separator)] != separator
                 else new_key[len(separator) :],
                 normalized_dict=normalized_dict,
                 separator=separator,
diff --git a/pandas/tests/io/json/test_normalize.py b/pandas/tests/io/json/test_normalize.py
@@ -220,6 +220,13 @@ def test_simple_normalize_with_separator(self, deep_nested):
         expected = Index(["name", "pop", "country", "states_name"]).sort_values()
         assert result.columns.sort_values().equals(expected)
 
+    def test_normalize_with_multichar_separator(self):
+        # GH #43831
+        data = {"a": [1, 2], "b": {"b_1": 2, "b_2": (3, 4)}}
+        result = json_normalize(data, sep="__")
+        expected = DataFrame([[[1, 2], 2, (3, 4)]], columns=["a", "b__b_1", "b__b_2"])
+        tm.assert_frame_equal(result, expected)
+
     def test_value_array_record_prefix(self):
         # GH 21536
         result = json_normalize({"A": [1, 2]}, "A", record_prefix="Prefix.")

Original file line number	Diff line number	Diff line change
`@@ -463,6 +463,7 @@ I/O`
`463`	`463`	- Bug in unpickling a :class:`Index` with object dtype incorrectly inferring numeric dtypes (:issue:`43188`)
`464`	`464`	- Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raising uncontrolled ``IndexError`` (:issue:`43102`)
`465`	`465`	- Bug in :func:`read_csv`, changed exception class when expecting a file path name or file-like object from ``OSError`` to ``TypeError`` (:issue:`43366`)
	`466`	+- Bug in :func:`json_normalize` where multi-character ``sep`` parameter is incorrectly prefixed to every key (:issue:`43831`)
`466`	`467`	- Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`)
`467`	`468`	`-`
`468`	`469`