Skip to content

BUG: Group-by on an empty data object dtype loses the index name (cython aggregation is ok) #8093

Closed
@carterk

Description

@carterk

If pd.read_sql is used to load a data frame with the results of an SQL query that returns no results, the columns in the data frame will be of type 'object'. That type cannot be aggregated, so a subsequent group-by operation on that empty data frame will drop all the columns. So instead of 'profit' in the below example being an empty series, an attribute error is thrown because the columns 'revenue' and 'expenses' cannot be found in the data frame.

Two things I can think of that could fix this:

  1. Have pd.read_sql populate the data frame with empty columns of the correct type even if the SQL query returns no results. Then the group-by would not drop the columns because they are of a type that can be aggregated.
  2. Have an option in groupby to not drop columns of types that cannot be aggregated: maybe a drop_non_agg flag. I think not dropping columns of types that cannot be aggregated should be the default behaviour. Columns with data that cannot be aggregated can just be populated with null after a group-by.

I think 1) probably should be implemented, and 2) is kind of a design decision.

You can run this code to reproduce the issue.

import pandas as pd
import sqlite3 as lite
import sys

finance = (
    (2, 132, 65),
    (6, 142, 86),
    (3, 183, 34),
    (3, 147, 46)
)

con = lite.connect('test.db')
cur = con.cursor()

cur.execute("DROP TABLE IF EXISTS finance")
cur.execute("CREATE TABLE finance(day_of_week INT, revenue FLOAT, expenses FLOAT)")
cur.executemany("INSERT INTO finance VALUES(?, ?, ?)", finance)

# remove the 'WHERE' clause, and the error won't be thrown
my_query = '''
    SELECT *
    FROM finance
    WHERE day_of_week = 5
    '''

df = pd.read_sql(my_query, con)

df_gb = df[df.day_of_week == 5].groupby('day_of_week').sum().reset_index()

profit = df_gb.revenue - df_gb.expenses # AttributeError thrown here

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions