Skip to content

form_blocks vs make_block inconsistency #19179

Closed
@jbrockmendel

Description

@jbrockmendel

Taking a cue from #19174 to revisit some logic in core.internals. form_blocks and make_block have some very similar logic. The question here is: are the discrepancies between then intentional?

Taking some liberties to make it more obvious how the logic is shared, the current code looks like:

def make_block(values, placement, klass=None, ndim=None, dtype=None,
               fastpath=False):
    [...]
        dtype = dtype or values.dtype
        vtype = dtype.type

        if isinstance(values, SparseArray):
            block_type = 'sparse'
        elif issubclass(vtype, np.floating):
            block_type = 'float'
        elif (issubclass(vtype, np.integer) and
              issubclass(vtype, np.timedelta64)):
            block_type = 'timedelta'
        elif (issubclass(vtype, np.integer) and
              not issubclass(vtype, np.datetime64)):
            block_type = 'int'
        elif dtype == np.bool_:
            block_type = 'bool'
        elif issubclass(vtype, np.datetime64):
            assert not hasattr(values, 'tz')
            block_type = 'datetime'
        elif is_datetimetz(values):
            block_type = 'datetime_tz'
        elif issubclass(vtype, np.complexfloating):
            block_type = 'complex'
        elif is_categorical(values):
            block_type = 'cat'
        else:
            block_type = 'object'
[...]

def form_blocks(arrays, names, axes):
    [...]
        if is_sparse(v):
            block_type = 'sparse'
        elif issubclass(vtype, np.floating):
            block_type = 'float'
        elif issubclass(vtype, np.complexfloating):
            block_type = 'complex'
        elif issubclass(vtype, np.datetime64):
            assert not is_datetimetz(v)
            block_type = 'datetime'
        elif is_datetimetz(v):
            block_type = 'datetime_tz'
        elif issubclass(vtype, np.integer):
            block_type = 'int'
        elif dtype == np.bool_:
            block_type = 'bool'
        elif is_categorical(v):
            block_type = 'cat'
        else:
            block_type = 'object'

[...]

The two main differences here are 1) is_sparse encompasses slightly more than isinstance(values, SparseArray) and 2)timedelta case is missing form form_blocks. Anyone know why?

Metadata

Metadata

Assignees

No one assigned

    Labels

    CleanInternalsRelated to non-user accessible pandas implementation

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions