Click Click and create using AWS Lake Formation

Ajith Shetty

5 min readNov 14, 2021

We are solving business problems on a daily basis. And each problem is different from one another.

If you look at it in a high level, its consist of 3 steps.

Ingest
Transform
Present

But when you go deeper, it’s much more than that.

Consider a case where you need to Apply prediction model on the Black Friday Sales.

Now you have to come up with the whole pipeline to do this.

So you would go to the whiteboard and start writing:

The data source: from where we are going to ingest the data
What type of data we bring in.
Incremental/Full
Transformation of the data based on business logics.
Access to the data sets
Storing the pre and post transformation data
Presentation of the data includes an access to different sources and collaboration of the teams.
And 100 more steps.

Now if you see you would have already spent a lot of your resources and lot of dollar money to build all this from scratch.

What if we can build the Data lake by just click of a button and take care of all the steps we just mentioned above.

Enter, AWS Lake Formation.

AWS Lake Formation

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.

Source: https://aws.amazon.com/lake-formation

What does it do

AWS Lake formation would help you create a data lake for you in easy way which can take care of the below for you:

Source Crawler: Which can connect to the source, be it RDBMS, NOSQL or S3 location.
ETL and Data prep: Can run the ETL workflow
Data Catalog: Can search for the required set of tables.
Security Settings: Can differentiate the set of users based on the job group
Access controls: Limit the access to the users.

Demo

In this demo we will look at a simple setup which will

connect to the source S3
Crawls the data
Creates the database and tables
Provide the access to the users.

AWS lake Formation consists of 3 steps:

Register Amazon S3 Storage
Create a Database
Gran Permission

Setup the S3 storage

Let’s create 2 folders for our data.

landing: Is where we have our newly ingested data placed, as is.

Let’s create 3 different folder structure for based on the dataset we receive.

And Register.

2. Lets create a database where we will store the data:

While you create a database, it would ask for Lake Formation Tags and let’s create it.

Final step where we grant the permission to the user:

Now Lets ingest the data:

We will use the Crawler to read the S3 bucket and create a table in the landing S3 location:

Lets run it:

Once the execution is complete we can see that the table has been created:

We can view the data:

Permission

Now you can even grant/revoke a permission to defined user for the given table:

Discoverability

Where user can search for the required data by just searching for them.

We need to add a tag in the table properties.

Reference:

What is a data lake?

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any…

aws.amazon.com

AWS Lake Formation - Build A Secure Data Lake - Amazon Web Services

AWS Lake Formation is a service that enables you to set up a secure data lake in days. Perform data discovery and…

aws.amazon.com

Ajith Shetty

Bigdata Engineer — Bigdata, Analytics, Cloud and Infrastructure.

Subscribe✉️ ||More blogs📝||LinkedIn📊||Profile Page📚||Git Repo👓

Interested in getting the weekly newsletter on the big data analytics around the world, do subscribe to my: Weekly Newsletter Just Enough Data