Closed
Description
see discussion in #3059, #3095, also see #1943, #3102
This only applies with a non-unique column index
Currently if duplicate columns across dtypes there are issues in getting the correct block given a column name.
I think it is possible, though non-trivial, to instead have a positional map from the frame columns to the BlockManager blocks, will simplify BlockManager.iget.
Primary motivation is to_csv currently cannot handle these types of lookups.
Also should eliminate need for _find_block
In [6]: df = pd.DataFrame(np.random.randn(8,4))
In [12]: df = pd.DataFrame(np.random.randn(8,4))
In [13]: df._data.blocks[0].ref_locs
Out[13]: array([0, 1, 2, 3])
In [14]: df = pd.DataFrame(np.random.randn(8,4),columns=['a']*4)
In [15]: df._data.blocks[0].ref_locs
---------------------------------------------------------------------------
/mnt/home/jreback/pandas/pandas/core/internals.py in ref_locs(self)
52 def ref_locs(self):
53 if self._ref_locs is None:
---> 54 indexer = self.ref_items.get_indexer(self.items)
55 indexer = com._ensure_platform_int(indexer)
56 if (indexer == -1).any():
/mnt/home/jreback/pandas/pandas/core/index.pyc in get_indexer(self, target, method, limit)
835
836 if not self.is_unique:
--> 837 raise Exception('Reindexing only valid with uniquely valued Index '
838 'objects')
839
Exception: Reindexing only valid with uniquely valued Index objects
This is the root of all evil, this should raise the same as above (but doesn't even if
I consolidate)......
In [16]: df = pd.DataFrame(np.random.randn(8,4))
In [17]: df.columns = ['a']*4
In [18]: df._data.blocks[0].ref_locs
Out[18]: array([0, 1, 2, 3])