strings not properly detected despite correct dtype in read_csv

Hello there!

I am working with text data, and I read my data in using

```
full_list =[]

for myfile in all_files:
    print("processing " + myfile)
    news = pd.read_csv(myfile, usecols = ['FULL_TIMESTAMP', 'HEADLINE'], dtype = {'HEADLINE' : str})
    full_list.append(news)
   
data_full = pd.concat(full_list)
```

As you see, I make sure that my headline variable is a `str`. However, when I type

`collapsed = data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))`

I get :


```
  File "<ipython-input-1-8ce0197f52ac>", line 34, in <module>
    collapsed =data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))

  File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2668, in aggregate
    result = self._aggregate_named(func_or_funcs, *args, **kwargs)

  File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2786, in _aggregate_named
    output = func(group, *args, **kwargs)

  File "<ipython-input-1-8ce0197f52ac>", line 34, in <lambda>
    collapsed = data_full.groupby('day').HEADLINE.agg(lambda x: '| '.join(x))

TypeError: sequence item 21: expected string, float found
```

To fix the problem, I need first to type

`data_full['HEADLINE'] = data_full['HEADLINE'].astype(str)`

Is that expected? I thought specifying the `dtypes` in `read_csv` was the most robust solution to have consistent types in the data? Still using Pandas 19.2.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

strings not properly detected despite correct dtype in read_csv #16569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

strings not properly detected despite correct dtype in read_csv #16569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions