Quite often i have to create, terminate and re-create EMR clusters in AWS, and the problem is - there is no s3cmd support by default. Thus, i have to install and configure these manually every time i need to download a script or a JAR from s3.
So, I have created a simple to-do list for myself, to be able to just copy-paste shell commands from there. Hope it would help you, too.
1. SSH into the main cluster machine (there is an "SSH" link at the cluster details page)
2. Install s3cmd - and i remind you that Suse does not have the Ubuntu apt-get. Furthermore, sudo yum install also won't work by default, we have to add the repository first:
cd /etc/yum.repos.dsudo wget http://s3tools.org/repo/RHEL_6/s3tools.repo
Now we can:
sudo yum install s3cmd
2.1 Now we have to configure it. I just copy the configuration from one of the other machines we have in EC2. But we have to create the file first:
(You could "cd ~" first, with "nano .s3cfg" afterwards)
Now, if you don't have such a file elsewhere, you will have to create if from scratch.
3. Install aws-cli. You don't always have to do this - for example, my EMR clusters with Hadoop / Spark already had it there, while Hadoop . Hive did not.
sudo yum install aws-cli
3.1 As with s3cmd, aws-cli has to be configured. Luckily, the config part is just to provide your AWS credentials:
mkdir ~/.awsnano ~./aws/credentials
[default]aws_access_key_id=<AWS key ID>aws_secret_access_key=<Secret Access Key>
And voila, both s3cmd and aws-cli are ready. Now you can "s3cmd get..", "s3cmd sync.." and have fun! :)