Using SBT assembly and S3 plugins to automate EMR deployments
February 06, 2020
I have been working with Apache Spark on Amazon's EMR recently, and it was a bit time consuming to manually upload my assembly (fat) jars produced by SBT assembly to S3 every time I wanted to make a tweak to my job. Fortunately, SBT has a S3 plugin that allows for easy access to S3.
Installing plugin
In project/plugins.sbt
add:
resolvers += Resolver.url("sbts3 ivy resolver",
url("https://dl.bintray.com/emersonloureiro/sbt-plugins"))
(Resolver.ivyStylePatterns)
addSbtPlugin("cf.janga" % "sbts3" % "0.10.3")
Combining Assembly and S3
Here's how I added a new task s3Assembly
that depends on assembly, and then finds your latest jar
and puts it on S3 in a location of your choosing.
In build.sbt
:
enablePlugins(S3Plugin)
s3Progress in s3Upload := true
mappings in s3Upload := Seq(
((assemblyOutputPath in assembly).value.getAbsoluteFile,
"YOUR-PATH-HERE" + (assemblyJarName in assembly).value)
)
s3Host in s3Upload :=
"YOUR-BUCKET-HERE.S3-REGION.amazonaws.com"
lazy val s3Assembly = TaskKey[Unit]("s3assembly",
"assemble and send to s3")
s3Assembly := Def.sequential(assembly, s3Upload).value
Be sure to set your bucket name and path and AWS Region.
And that's it! Just run s3Assembly
when you want to send your jar to S3.
AWS Credentials
Make sure you are logged in to AWS using the normal aws
CLI tool methods (e.g. ~/.aws/credentials
).