Description
In the What's New - Other enhancements section from the 18.0 release it says that you can define the environment variable AWS_S3_HOST
.
Therefore, when setting this environment variable, together with AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
I expected the following code to work.
import pandas as pd
df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
df.to_csv(fs.open('s3://bucket/key', mode='w'))
The script crashes with:
Exception ignored in: <bound method S3File.__del__ of <S3File bucket/key>>
Traceback (most recent call last):
File "/home/xxx/.local/share/virtualenvs/xxx/lib/python3.6/site-packages/s3fs/core.py", line 1518, in __del__
self.close()
File "/home/xxx/.local/share/virtualenvs/xxx/lib/python3.6/site-packages/s3fs/core.py", line 1496, in close
raise_from(IOError('Write failed: %s' % self.path), e)
File "<string>", line 3, in raise_from
OSError: Write failed: bucket/key
This is the exact same error as when I don't specify the variable AWS_S3_HOST
at all.
I managed to get a working solution by directly interacting directly with s3fs
:
import os
import pandas as pd
import s3fs
fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': os.environ["AWS_S3_HOST"]})
df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
df.to_csv(fs.open('s3://bucket/key', mode='w'))
Looks like this feature got (accidentally?) removed in https://github.com/pandas-dev/pandas/commit/dc4b0708f36b971f71890bfdf830d9a5dc019c7b#diff-a37b395bed03f0404dec864a4529c97dL94
, when they switched from boto
to s3fs
.
Is it planned to support this environment variable again?
I'm sure that many companies, e.g. like the one I work for, hosts their own s3 server, e.g. with Minio, and don't use Amazon.