Closed
Description
Hello there!
I am working with text data, and I read my data in using
full_list =[]
for myfile in all_files:
print("processing " + myfile)
news = pd.read_csv(myfile, usecols = ['FULL_TIMESTAMP', 'HEADLINE'], dtype = {'HEADLINE' : str})
full_list.append(news)
data_full = pd.concat(full_list)
As you see, I make sure that my headline variable is a str
. However, when I type
collapsed = data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))
I get :
File "<ipython-input-1-8ce0197f52ac>", line 34, in <module>
collapsed =data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2668, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2786, in _aggregate_named
output = func(group, *args, **kwargs)
File "<ipython-input-1-8ce0197f52ac>", line 34, in <lambda>
collapsed = data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))
TypeError: sequence item 21: expected string, float found
To fix the problem, I need first to type
data_full['HEADLINE'] = data_full['HEADLINE'].astype(str)
Is that expected? I thought specifying the dtypes
in read_csv
was the most robust solution to have consistent types in the data? Still using Pandas 19.2.
Thanks!