Bootstrap Actions

When you provision an EMR cluster, you can define bootstrap actions to execute before processing data. You might use bootstrap actions to perform tasks such as installing a driver on the cluster or creating directories and setting permissions on them.

To use bootstrap actions, you can define the scripts to use in the cluster configuration properties or you can specify bootstrap actions scripts stored on Amazon S3.

When you specify more than one bootstrap action, place them in the order that you want them to run. If a bootstrap action fails to execute, Transformer cancels the cluster provisioning and stops the pipelinejob.

For more information about EMR bootstrap actions, see the EMR documentation.

Example

You can use a bootstrap action to set up SSH on the provisioned cluster.

First, you generate a public key. Then, you use the key in the following script, which you enter in the Bootstrap Action Scripts cluster configuration property:
#!/bin/bash

cat > /tmp/authorized_keys << EOF
<PUBLIC_SSH_KEY>
EOF
sudo sh -c 'mv /tmp/authorized_keys /home/ec2-user/.ssh/authorized_keys'
sudo sh -c 'chown ec2-user:ec2-user /home/ec2-user/.ssh/authorized_keys'
sudo sh -c 'chmod 600 /home/ec2-user/.ssh/authorized_keys'

You could alternatively save the script on S3 and specify the script location in the Bootstrap Actions cluster configuration property.