Description
At the moment, with the new sql code, there is a bug in the conversion of invalid column names (trough the function _safe_col_name
). Eg:
In [1]: import sqlalchemy
In [2]: engine = sqlalchemy.create_engine('sqlite:///:memory:')
In [4]: df = pd.DataFrame({'safe_col':[1,2], 'invalid col':[3,4]})
In [7]: pd.io.sql.to_sql(df, 'test', engine)
In [9]: pd.io.sql.read_table('test', engine)
Out[9]:
pandas_index invalid_col safe_col
0 0 None 1
1 1 None 2
I know the reason, and it is easily fixed (the table statement uses the adapted column, but the inserting still the old one, so the data are not inserted). But I was wondering if this is actually necessary?
Because, databases can handle column names with spaces (at least sqlite, mysql and postgresql that I know a little bit), the names are then just quoted and sqlalchemy ensures this is done properly for us.
Often, it is advised to not use spaces (or capital letters) in column names, as it is just tedious to always have to quote them and it is more robust to not have to do that. But it does work, so I would think it is up to the user to provide the column names he/she wants.
So proposal:
- we don't change the column names and leave them as the user provides them
- we could provide a kwarg to do this if the users wants to, but I don't think this is priority at the moment (and I think we should then also convert to lower case, as there is as much reason to do that as to changes spaces to underscores)
PS: the fact there are None
s in the example above and not NaN
s is still another issue I think.