Closed
Description
Dear pandas team
My environment is:
numpy==1.9.3
pandas==0.16.2
On it, may it be possible that the bug #10172 has been re-introduced? For instance, with a csv file (mini.csv
) with three int columns generated as per the code below:
import random
def contents(f, sc1=10, sc2=1000, cnt=10000):
for i in range(cnt):
zone = random.choice(range(10))
val1 = random.randint(0, zone * sc1)
val2 = random.randint(0, zone * sc2)
f.write("%d,%d,%d\n" % (zone, val1, val2))
with open("mini.csv", "w") as f:
f.write("zipcode,sqft,price\n")
contents(f, sc2=1000000, cnt=50000)
Once we load the file into pandas
import pandas
mini = pandas.read_csv("mini.csv")
mini.groupby('zipcode')[['price']].mean()
results in an the price mean being an int64
price
zipcode
0 0
1 499960
2 1005490
3 1465088
4 2001135
5 2495200
6 2993253
7 3569320
8 4076548
9 4416133
but if we add an extra column to the selection,
mini.groupby('zipcode')[['price','sqft']].mean()
The price mean is a float64
price sqft
zipcode
0 0.000000 0.000000
1 499960.563138 5.000400
2 1005490.239062 9.928459
3 1465088.765919 14.922507
4 2001135.222045 20.148962
5 2495200.657097 25.160088
6 2993253.872691 29.624297
7 3569320.017926 35.089428
8 4076548.696706 39.605981
9 4416133.246409 44.851756
Thanks in advance
Cristobal