S3 Dist CP

Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3/HDFS into S3/HDFS

s3-dist-cp -Dmapreduce.job.reduces=100 \
    -Dfs.s3a.access.key=yourAccessKey \
    -Dfs.s3a.secret.key=yourSecretKey\
    -Dmapreduce.reduce.maxattempts=3 \
    --outputManifest manifest-checkpoint.gz \
    --s3Endpoint=s3.us-east-1.amazonaws.com --src=s3a://source_s3/ --dest=s3a://dest_s3/

yourAccessKey and yourSecretKey should have access to both S3 source and destination.

References: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html

Last updated