Using SBT assembly and S3 plugins to automate EMR deployments
I have been working with Apache Spark on Amazon's EMR recently, and it was a bit time consuming to manually upload my assembly (fat) jars produced by SBT assembly to S3 every time I wanted to make a tweak to my job. Fortunately, SBT has a S3 plugin that allows for easy access to S3.
resolvers += Resolver.url("sbts3 ivy resolver", url("https://dl.bintray.com/emersonloureiro/sbt-plugins")) (Resolver.ivyStylePatterns) addSbtPlugin("cf.janga" % "sbts3" % "0.10.3")
Combining Assembly and S3
Here's how I added a new task
s3Assembly that depends on assembly, and then finds your latest jar
and puts it on S3 in a location of your choosing.
enablePlugins(S3Plugin) s3Progress in s3Upload := true mappings in s3Upload := Seq( ((assemblyOutputPath in assembly).value.getAbsoluteFile, "YOUR-PATH-HERE" + (assemblyJarName in assembly).value) ) s3Host in s3Upload := "YOUR-BUCKET-HERE.S3-REGION.amazonaws.com" lazy val s3Assembly = TaskKey[Unit]("s3assembly", "assemble and send to s3") s3Assembly := Def.sequential(assembly, s3Upload).value
Be sure to set your bucket name and path and AWS Region.
And that's it! Just run
s3Assembly when you want to send your jar to S3.
Make sure you are logged in to AWS using the normal
aws CLI tool methods (e.g.