Click Click and create using AWS Lake Formation

Photo by La-Rel Easter on Unsplash

We are solving business problems on a daily basis. And each problem is different from one another.

If you look at it in a high level, its consist of 3 steps.

  1. Ingest
  2. Transform
  3. Present

But when you go deeper, it’s much more than that.

Consider a case where you need to Apply prediction model on the Black Friday Sales.

Now you have to come up with the whole pipeline to do this.

So you would go to the whiteboard and start writing:

  1. The data source: from where we are going to ingest the data
  2. What type of data we bring in.
  3. Incremental/Full
  4. Transformation of the data based on business logics.
  5. Access to the data sets
  6. Storing the pre and post transformation data
  7. Presentation of the data includes an access to different sources and collaboration of the teams.
  8. And 100 more steps.

Now if you see you would have already spent a lot of your resources and lot of dollar money to build all this from scratch.

What if we can build the Data lake by just click of a button and take care of all the steps we just mentioned above.

Enter, AWS Lake Formation.

AWS Lake Formation

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.

Source: https://aws.amazon.com/lake-formation

What does it do

AWS Lake formation would help you create a data lake for you in easy way which can take care of the below for you:

  1. Source Crawler: Which can connect to the source, be it RDBMS, NOSQL or S3 location.
  2. ETL and Data prep: Can run the ETL workflow
  3. Data Catalog: Can search for the required set of tables.
  4. Security Settings: Can differentiate the set of users based on the job group
  5. Access controls: Limit the access to the users.

Demo

In this demo we will look at a simple setup which will

  1. connect to the source S3
  2. Crawls the data
  3. Creates the database and tables
  4. Provide the access to the users.

AWS lake Formation consists of 3 steps:

  1. Register Amazon S3 Storage
  2. Create a Database
  3. Gran Permission

Setup the S3 storage

Let’s create 2 folders for our data.

landing: Is where we have our newly ingested data placed, as is.

Let’s create 3 different folder structure for based on the dataset we receive.

And Register.

2. Lets create a database where we will store the data:

While you create a database, it would ask for Lake Formation Tags and let’s create it.

Final step where we grant the permission to the user:

Now Lets ingest the data:

We will use the Crawler to read the S3 bucket and create a table in the landing S3 location:

Lets run it:

Once the execution is complete we can see that the table has been created:

We can view the data:

Permission

Now you can even grant/revoke a permission to defined user for the given table:

Discoverability

Where user can search for the required data by just searching for them.

We need to add a tag in the table properties.

Reference:

Ajith Shetty

Bigdata Engineer — Bigdata, Analytics, Cloud and Infrastructure.

Subscribe✉️ ||More blogs📝||LinkedIn📊||Profile Page📚||Git Repo👓

Interested in getting the weekly newsletter on the big data analytics around the world, do subscribe to my: Weekly Newsletter Just Enough Data

--

--

--

Bigdata Engineer — Love for BigData, Analytics, Cloud and Infrastructure. Want to talk more? Ping me in Linked In: https://www.linkedin.com/in/ajshetty28/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Interpreting v Compiling Source Code

Spring — Dynamically register beans in 4 ways At Run-Time

Reduce Cost and Increase Productivity with Value Added IT Services from buzinessware — {link} -

Diving into Sinatra and ActiveRecord

Silq — A New High Level Programming Language for Quantum Computing

Creating a random hex code generator

Get Unused Data Sources in Tableau Server Using REST API

Java & Cypress 101

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ajith Shetty

Ajith Shetty

Bigdata Engineer — Love for BigData, Analytics, Cloud and Infrastructure. Want to talk more? Ping me in Linked In: https://www.linkedin.com/in/ajshetty28/

More from Medium

AWS Data Platform — Architecture Primer

Apache Airflow migraiton journey from self-hosted to AWS Managed Airflow

Building a Secure Data Lake with AWS Lake Formation

AWS S3 data-source for Grafana