Top 5 AWS Glue Cost Optimization Techniques

Amit Damle
3 min readSep 23, 2024

--

AWS Glue is a fully managed Serverless integration service that help ingest, catalog, prepare the data using no/low code or code first manner
Since Glue is Serverless customers need not have to manage any clusters. When it comes to Cost AWS Glue has limited optimization options.
In this blog I am going to describe top 5 Options to optimize Glue cost.

Based on my experience here is one important tip for you — If you have a complex job that churns large dataset and potentially runs for couple of hours, then best option is to evaluate EMR Spark for such jobs.

My content is focused on cost optimization via platform tuning rather than code tuning. if you want to tune the code as well then please refer following perspective guidance on Glue job tuning —

Glue job Performance tuning

Cost Optimization Options —

Option-1: Choose Latest Glue version and Correct Worker Type

Use latest version of Glue that provide latest Scala, Spark and python versions. At the time of writing this blog Glue 4.0 Spark 3.3 was latest.
Use correct Worker type based on your job. if job has more shuffle data then using worker with more resources can help. if job does has too many tasks then adding multiple smaller workers may also help
Worker types available with Glue is as below

Option-2 : Auto-scale Worker nodes

One of the best option to expedite the job execution in Glue is to use worker auto scaling. Reduction in job execution time can save the cost.
As shown in following diagram, dynamic scaling makes sure to scale out cluster when specific spark stage needs more processing power and scale-in when requirement dies off rather than keeping all the nodes

Option-3 : Use of Flex Execution

If you do not have SLA sensitive jobs then AWS Glue provides Flex Execution option which is equivalent of SPOT instances. It reduce costs by running the job on a spare capacity. When activated, job execution will wait till the spare capacity is available in specific region. This is good for dev / test type of jobs or jobs with flexible SLAs

Option 4: Job Timeouts
Jobs running for long period of time due to script or network issues, data anomalies may cause unexpected cost increase. This situation can be mitigated by setting the Job timeouts. By default jobs will run for 2880 minutes (48 hrs.), depending on your scenario please set it to optimal number to avoid unnecessary job execution that overruns the budgeted cost.

Option 5: Use of Glue Docker image for Development
AWS Glue provides interactive endpoint for quick analysis and development but continuous use of interactive endpoints will add significant cost overhead. AWS Glue provides docker image which can be downloaded on individual developers machine for building and testing jobs locally

Conclusion —

Using above mentioned options customers can keep cost of Glue processing jobs under control.

References —

Working with Spark Jobs

Refer this blog for Glue Job Monitoring

Glue Job Local Development

Disclaimer: Ideas / views expressed here are my personal opinions

--

--

No responses yet