The post is a quick guide on “How we could use AWS lambda services to deploy some standalone task”. This is really helpful when we dont have to buy instances for such task. By using word task, I mean scripts, that could scrap data, or a cron job data aggregator. The BEST thing about the cron job is that , it is way cheaper. A million hits by lambda function is Free in AWS.
The following steps could lets you get going. In this example below, I am aggregating data in MongoDB and store the output in Redis.
Step 1 function creating
Create function : One has to login to the AWS console and create a lambda function. Think Lambda function as a task that we want to do here.
To run a function , we need a script. Lets say :
The script wont run until it mets all its dependency. In Data Science world, pandas , numpy etc.
We need to provide all the Dependency a script it requires.
Step 2 Dependencies
For managing the Dependencies : we could looked at the official Documentation. In short: We want to dump all the dependencies in a folder.
Yeah, I didnt include pandas, numpy and scipy. Because that is different case. The AWS lambda works in different version of linux which doesnt support the pypi installation of pandas and numpy. As those libraries extensibly use cython for optimization.
To use such Libraries, we will get to the new concept of Lmbda function ie : LAYERS
step 3: the code
Make your code ready, especially the Beginning and the end.
Most of the standard python use config.ini
or .env
file to get the variables value. Change that to os.environ.gte()
Similarly, at the end of the code.
You could change the function name lambda_handler
and the filename
but it requires you to change the name in AWS lambda function console too. So, for sake of lazyness, lets stick to the default naming.
Step 4: shipment
Make the shipment ready.
Here, what we did is , we zipped the packages [ dependencies ] and then add our script to the
Step 5: layers ?
You could see Function code
, when you scroll down in the lambda dunction page in your AWS console , and Actions
box, where one get options like upload a .zip file
or upload a file from Amazon S3
Once you upload your zip file. You will see your code [ unless your code is more that 10 MB ]
There you could save and test the code. But since you need more complex dependencies You need to know little mroe about layers.
Step 5 : yeah, the layers.
If your code depends on AWS lambda function layers, you could do Three things to solve this.
link to add layer to your lambda function
Note: trending-aggregator
is my function name.
use standard layer
more info its easy, its provided by AWS ( safe )
One could select Select from list of runtime compatible layers
when they are in their layer page and see the AWS standard layer name.
use someone else layer
yeah, well, whatever you gonna do, someone had already done it may be 5 years ago. So, you could use their layer. Until you have their layer code.
Luckily someone shares the detail in a stackoverfflow question. Cheers.
more info
One could select Provide a layer version ARN
in their layer page.
and add this arn code arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-pandas:1
BOOM , Thanks Mate.
use your own layer
Security Issues ?, To create our own layer , we need another AWS machine ( EC2). Login to another EC2 machine and follow these steps:
This will create a zip file which include pandas , as installed natively in a weird AWS machine.
Here, we make our AWS credential compatible. Now we have a zip containing dependency and our AWS account is connected. As suggested by one stackoverflow answer.
But it didnt, so Our DevOps engineer finds a hack to deploy this as a layer. We first push the zip file to a S3 bucket and then publish the layer from there.
yeah, owning one own layer is hard, But we already have two other alternatives.
Run, you could test the model, some database dependency like redis and mongodb should be abale to communicate with AWS lambda otherwise it wont work. But Once you do that, it will surely work and you will see the bueaty.
You will only get billed for the time you ran for. You could limit the resources, see Memory size and Max Memory used.
I hope the stats are convincing enough to adopt Lambda function instead a buying bulky Instances.