Using SBT assembly and S3 plugins to automate EMR deployments

I have been working with Apache Spark on Amazon's EMR recently, and it was a bit time consuming to manually upload my assembly (fat) jars produced by SBT assembly to S3 every time I wanted to make a tweak to my job. Fortunately, SBT has a S3 plugin that allows for easy access to S3.

Installing plugin

In project/plugins.sbt add:

resolvers += Resolver.url("sbts3 ivy resolver", 
  url("https://dl.bintray.com/emersonloureiro/sbt-plugins"))
  (Resolver.ivyStylePatterns)

addSbtPlugin("cf.janga" % "sbts3" % "0.10.3")

Combining Assembly and S3

Here's how I added a new task s3Assembly that depends on assembly, and then finds your latest jar and puts it on S3 in a location of your choosing.

In build.sbt:

enablePlugins(S3Plugin)

s3Progress in s3Upload := true

mappings in s3Upload := Seq(
  ((assemblyOutputPath in assembly).value.getAbsoluteFile,
    "YOUR-PATH-HERE" + (assemblyJarName in assembly).value)
)

s3Host in s3Upload := 
  "YOUR-BUCKET-HERE.S3-REGION.amazonaws.com"

lazy val s3Assembly = TaskKey[Unit]("s3assembly", 
  "assemble and send to s3")

s3Assembly := Def.sequential(assembly, s3Upload).value

Be sure to set your bucket name and path and AWS Region.

And that's it! Just run s3Assembly when you want to send your jar to S3.

AWS Credentials

Make sure you are logged in to AWS using the normal aws CLI tool methods (e.g. ~/.aws/credentials).