Serverless Jobs

Cloud Computing

Serverless Jobs — a New Paradigm in Cloud

We may ask? Why do we call a new paradigm for the serverless job, everybody using Cloud Services nowadays and especially serverless functionalities? Here I’m gonna share my experience from one of my recent projects in cloud computing to conclude if is this a paradigm or not.

What are Serverless Jobs?

  • Serverless computing (or serverless for short), is an execution model where the cloud provider (AWS, Azure, or Google Cloud) is responsible for executing a piece of codeby dynamically allocating the resources. And only charges for the number of resources used to run the code.
  • A program can solve the problem with the execution(Scheduled/On-Demand) independentlyand it’s called Jobs.
  • To combine these two (Serverless + Job) we need anto orchestrate the cloud + execution and deploy the program.

Problem statement

Since I came across multiple frameworks in our projects like quartz jobs which heavily used dedicated server (EC2 – instances) 24/7 (We may think why can’t we stop when not used — Hence the schedule framed in such a way).

Here we have limitations (problem) when we used the frameworks,

  • Job-based resource allocations
  • Use a resource when you need it to avoid the upfront cost
  • Job-based programming language supports
  • Job-based resource utilization monitoring
  • On-Demand execution with task workflow (based on load distributed execution)

Solutions
Given are the solutions based on the projects I have worked and from the experience in AWS Cloud.

System Overview

I took the following AWS services to achieve the Serverless Jobs paradigm.

  1. Event Bridge (scheduling)
  2. StepFunction
  3. Lambda
  4. ECS ( Fargate)
  5. ECR
  6. S3
  7. DynamoDB
  8. Cloud Watch
  9. API Gateway
  10. Simple Queue Service

Here, I will try to justify each resource, and why have used these stacks.

1. Event Bridge
The primary element of the job is always to schedule the things, We used this schedule as a resource for each job. There is also the feasibility to enable or disable at any time.

2. StepFunction
The core of art — the step function plays a major role in serverless jobs architecture. All our above problem statements solution covered in this single resource.

It connects multiple things like job stack/audit lambda/dynamo DB/notification(slack).

We built a workflow more customized based on below problems.


S
ometimes the jobs have to run one at a time, we used to check in dynamo DB whether already a trigger entry is in the PROGRESS state. If so, the workflow has to cancel the executions else execute the trigger.

On-Demand runs jobs with customer payload. The job has to run schedule and on-demand as well. Also, it should work with a dynamic payload. Since step functions have the capability to pass through a dynamic input. Based on input the workflow passes through the actual jobs(But the jobs have read this input).

Notification is an important feature when the start of a job/stop/cancel/failure trigger(s) everything gets notified through slack.

Force to stop a job trigger anytime. For instance, If we feel the jobs have occupied more resource utilization, we could stop the trigger instantly.

Result aggregation which is also more important for what was happened inside the jobs that output will capture through workflow and persist in dynamo DB.

3. Lambda
Everybody knows the serverless jobs will be a lambda, Yes it’s correct.

Will solve the following problems:

It supports many languages, Preferable smaller and quicker jobs for low cost with second level pricing.

But not all the cases will it solve since lambda has its own limitation of runtime timeout, code bundling size, and memory size.

4.ECS ( Fargate)

To solve the Lambda limitation, we came to know that Elastic Container Service has the capability and solves our problems.

For Long-running and more memory consumable jobs are deployed and run through step functions.

5.ECR

Elastic Container Service — which builds our solution bundle and is deployed in ECR, especially for ECS jobs and lambda container jobs.

6.S3

The bucket has hold the Lambda handler bundle.

7.DynamoDB

The dynamo DB is crucial for this system since it is auto scalable and event-driven support natively.

Following tables:

  • Jobs (job configuration like — job name, type (lambda/ECS), memory, language, bundle(binary) location)
  • Stacks (Where stack have deployed — lambda arn, ecr, cloud watch path,ecs task, ecs cluster)
  • Triggers ( The execution of jobs each time — start time, end time, status, success/failure message)

Mainly the jobs table has enabled the event( CREATE/MODIFY/DELETE) all these events have to execute the lambda(we create a small java program using cloud formation SDK) and based on data it builds a cloud formation dynamically and creates/updates/delete a stack and update the stack details in the appropriate table.

8.CloudWatch

Capture all the logs in single places and monitor the jobs very easily.

9.API Gateway

On top of dynamo DB data, we build a simple User Interface to check/monitor the job/stack/trigger details efficiently. Also, add the on-demand job trigger API which connects the step function and runs the jobs instantly with dynamic payload.

10.Simple Queue Service

Assume that suddenly the jobs have a huge load to process certain time limit, we could split the jobs, run in parallel, and complete the triggers.

Conclusion

We used to run this model for more than a year, almost 10 million executions were completed and run without any issues. This is why I said earlier Serverless Jobs — a New Paradigm in Cloud, Still, we are adding more jobs in this model.

P.S
Even though we used all these AWS services, success of this implementation was by providing simple UI and CLI plugins to manage these services.

Author: Sabarinathan Soundararajan


Subscribe to our Newsletter

Be the first to know when there is a new blog or any other conten