Skip to content

mean of int64 results in int64 instead of float64 #11199

Closed
@c-garcia

Description

@c-garcia

xref #15091
xref #3707

Dear pandas team

My environment is:

numpy==1.9.3
pandas==0.16.2

On it, may it be possible that the bug #10172 has been re-introduced? For instance, with a csv file (mini.csv) with three int columns generated as per the code below:

import random

def contents(f, sc1=10, sc2=1000, cnt=10000):
    for i in range(cnt):
        zone = random.choice(range(10))
        val1 = random.randint(0, zone * sc1)
        val2 = random.randint(0, zone * sc2)
        f.write("%d,%d,%d\n" % (zone, val1, val2))

with open("mini.csv", "w") as f:
    f.write("zipcode,sqft,price\n")
    contents(f, sc2=1000000, cnt=50000)

Once we load the file into pandas

import pandas
mini = pandas.read_csv("mini.csv")
mini.groupby('zipcode')[['price']].mean()

results in an the price mean being an int64

          price
zipcode
0              0
1         499960
2        1005490
3        1465088
4        2001135
5        2495200
6        2993253
7        3569320
8        4076548
9        4416133

but if we add an extra column to the selection,

mini.groupby('zipcode')[['price','sqft']].mean()

The price mean is a float64

                  price       sqft
zipcode
0              0.000000   0.000000
1         499960.563138   5.000400
2        1005490.239062   9.928459
3        1465088.765919  14.922507
4        2001135.222045  20.148962
5        2495200.657097  25.160088
6        2993253.872691  29.624297
7        3569320.017926  35.089428
8        4076548.696706  39.605981
9        4416133.246409  44.851756

Thanks in advance

Cristobal

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions