Skip to content

MultiIndex - Comparison with Mixed Frequencies (and other FUBAR) #17112

Closed
@jbrockmendel

Description

@jbrockmendel

Setup:

index = pd.Index(['PCE']*4, name='Variable')
data = [
	pd.Period('2018Q2'),
	pd.Period('2021', freq='5A-Dec'),
	pd.Period('2026', freq='10A-Dec'),
	pd.Period('2017Q2')
	]
ser = pd.Series(data, index=index, name='Period')

In the real-life version of this issue, 'Period' is a column in a DataFrame and I need to append it as a new level to the index. The snippets here show the problem(s) in both py2 and py3, but for reasons unknown df.set_index('Period', append=True) goes through fine in py2.

The large majority of Period values are quarterly-frequency.

py2

>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1710, in sort_values
    argsorted = _try_kind_sort(arr[good])
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
    return arr.argsort(kind=kind)
  File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842)
pandas._libs.period.IncompatibleFrequency: Input has different freq=10A-DEC from Period(freq=Q-DEC)

>>> ser.to_frame()
         Period
Variable       
PCE      2018Q2
PCE        2021
PGDP       2026
PGDP     2017Q2
>>> ser.to_frame().set_index('Period', append=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2836, in set_index
    index = MultiIndex.from_arrays(arrays, names=names)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 310, in __init__
    raise NotImplementedError("> 1 ndim Categorical are not "
NotImplementedError: > 1 ndim Categorical are not supported at this time

No idea why it thinks Categorical is relevant here. That doesn't happen in py3.

For the purposes of sort_values, refusing to sort might make sense. But when all I care about is set_index, I'm pretty indifferent to the ordering.

py3

>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1710, in sort_values
    argsorted = _try_kind_sort(arr[good])
  File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
    return arr.argsort(kind=kind)
  File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
    return not self == other
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
    if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set

>>> ser.to_frame().set_index('Period', append=True)
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index
    index = MultiIndex.from_arrays(arrays, names=names)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp>
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize
    assume_unique=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort
    sorter = values.argsort()
  File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
    return not self == other
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
    if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set

I have no idea what to make of this.

A problem that I have not been able to replicate with a copy/pasteable subset of the data:

>>> mi = pd.MultiIndex.from_arrays([period.index, period])
>>> mi
[... prints roughly what we'd expect...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape
    return self._values.shape
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values
    return self.values
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values
    self._tuples = lib.fast_zip(values)
  File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513)
ValueError: all arrays must be same length

>>> mi.names
FrozenList(['Variable', None])
>>> mi[0]
('CPROF', 'Period')
>>> mi[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1377, in __getitem__
    if lab[key] == -1:
IndexError: index 1 is out of bounds for axis 0 with size 1

AFAICT it took the name 'Period' and made that the only value in the new level of the MultiIndex. Really no idea what's going on here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions