Our company has a requirement to encrypt all data that is at rest in S3. Usually when we upload s3 object, we do something like:
aws s3 cp a.txt s3://b/test --sse
I am playing with dask.dataframe
and want to export one of my dataset into parquet stored in S3 but cannot find any option to turn on encryption. Any idea how to apply encryption using dask.dataframe?
This is not currently implemented in s3fs, the backend used by dask to write to S3. It would not be hard to add, by including (some of) the following parameters in the constructor for S3FileSystem, and including them in the small number of calls on the boto3 s3client; then the parameters would be included in storage_options=
when calling to_parquet()
.
ServerSideEncryption='AES256'|'aws:kms',
SSECustomerAlgorithm='string',
SSECustomerKey='string',
SSEKMSKeyId='string',
There should also be an option to set these per-file as well as by default on the file-system instance. Feel free to attempt a PR! Note that SSE is probably not implemented in moto
, so testing usage may be difficult.
Note that in your case, some of these values are probably being read by the aws command from a standard location such as ~/.aws/
.
As of now, this is possible:
df.to_csv(s3_path, storage_options={"s3_additional_kwargs":{"ServerSideEncryption": "AES256"}})