Click Click and create using AWS Lake Formation

Photo by La-Rel Easter on Unsplash

We are solving business problems on a daily basis. And each problem is different from one another.

If you look at it in a high level, its consist of 3 steps.

  1. Ingest
  2. Transform
  3. Present

But when you go deeper, it’s much more than that.

Consider a case where you need to Apply prediction model on the Black Friday Sales.

Now you have to come up with the whole pipeline to do this.

So you would go to the whiteboard and start writing:

  1. The data source: from where we are going to ingest the data
  2. What type of data we bring in.
  3. Incremental/Full
  4. Transformation of the data based on business logics.
  5. Access to the data sets
  6. Storing the pre and post transformation data
  7. Presentation of the data includes an access to different sources and collaboration of the teams.
  8. And 100 more steps.

Now if you see you would have already spent a lot of your resources and lot of dollar money to build all this from scratch.

What if we can build the Data lake by just click of a button and take care of all the steps we just mentioned above.

Enter, AWS Lake Formation.

AWS Lake Formation

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.

Source: https://aws.amazon.com/lake-formation

What does it do

AWS Lake formation would help you create a data lake for you in easy way which can take care of the below for you:

  1. Source Crawler: Which can connect to the source, be it RDBMS, NOSQL or S3 location.
  2. ETL and Data prep: Can run the ETL workflow
  3. Data Catalog: Can search for the required set of tables.
  4. Security Settings: Can differentiate the set of users based on the job group
  5. Access controls: Limit the access to the users.

Demo

In this demo we will look at a simple setup which will

  1. connect to the source S3
  2. Crawls the data
  3. Creates the database and tables
  4. Provide the access to the users.

AWS lake Formation consists of 3 steps:

  1. Register Amazon S3 Storage
  2. Create a Database
  3. Gran Permission

Let’s create 2 folders for our data.

landing: Is where we have our newly ingested data placed, as is.

Let’s create 3 different folder structure for based on the dataset we receive.

And Register.

2. Lets create a database where we will store the data:

While you create a database, it would ask for Lake Formation Tags and let’s create it.

Final step where we grant the permission to the user:

Now Lets ingest the data:

We will use the Crawler to read the S3 bucket and create a table in the landing S3 location:

Lets run it:

Once the execution is complete we can see that the table has been created:

We can view the data:

Now you can even grant/revoke a permission to defined user for the given table:

Where user can search for the required data by just searching for them.

We need to add a tag in the table properties.

Reference:

Ajith Shetty

Bigdata Engineer — Bigdata, Analytics, Cloud and Infrastructure.

Subscribe✉️ ||More blogs📝||LinkedIn📊||Profile Page📚||Git Repo👓

Interested in getting the weekly newsletter on the big data analytics around the world, do subscribe to my: Weekly Newsletter Just Enough Data

Bigdata Engineer — Love for BigData, Analytics, Cloud and Infrastructure. Want to talk more? Ping me in Linked In: https://www.linkedin.com/in/ajshetty28/