Description
As the next step of separation-of-concerns plan (#6744) I'd like to
propose adding a method (or several, actually) to Index
class that
would encapsulate the details of foo.loc[l1,l2,...]
lookup.
Implementation Idea
Roughly, the idea is to make loc
's getitem as simple as
def __getitem__(self, indexer):
axes = self.obj.axes
return self.obj.iloc[axes[0].lookup_labels_nd(indexer, axes[1:], typ='loc')]
Not quite, but hopefully you get the point. The default lookup_labels_nd
implementation would then look something like this:
def lookup(self, indexer, other_axes, typ=None):
if not isinstance(indexer, tuple):
return self.lookup_labels(indexer, typ=typ)
else:
# ndim mismatch error handling is omitted intentionally
return (self.lookup_labels(indexer[0]),) + \
tuple(ax.lookup_labels(ix, typ=typ)
for ax, ix in zip(other_axes, indexer))
The result should be an object that could be fed to an underlying
BlockManager to perform the requested operation. To support adding
new rows with "setitem", it is only needed to agree that lookup_labels_nd
will
never return negative indices unless they reference newly appended
items along that axis.
This would allow to hide Index-subclass-specific lookup peculiarities
in their respective overrides of lookup_labels_nd
and lookup_labels
(proposals for
better names are welcome), e.g.:
- looking up str in DatetimeIndex/PeriodIndex
- looking up int in FloatIndex
- looking up per-level slices in MultiIndex
Benefits
- no more confusing errors due to
try .. catch
block carpet-catching a
logic error, because corner cases will be handled precisely where
they are needed and nowhere else - no more relying on isinstance checks and exceptions to seek for
alternative lookup scenarios, meaning more performance - the API will provide a contract that is simple to grasp, test, benchmark and,
eventually, cythonize (as a side effect of this point I'd like to try putting
up a wiki page with indexing API reference)