High-Five to Fivetran
We data engineers have always been referred as somebody who moves the data from one place to another. And frankly we are much more than that.
We do not want to write a pipeline from scratch to read the data, get the delta, create the connector, write to destination and manage them.
We want an automated process. So that we work on business logics.
Challenges in the traditional ETL
The day to job of a data Engineers include:
- Ever changing Schema
Schemas change, and they change way frequently. There is no easier process we can actually track it every time before every ingestion. Its not only tracking the schema change but to apply the same in the destination so that my pipeline doesn’t brake.
2. Sync data sources
We have 100’s of sources and 1000’s of destinations. And we are millions of pipelines. Its nearly impossible to have a single day without your pipeline braking for some or the other reasons. We need to make sure our pipeline runs every single data and my source and destinations are in sync.
3. manual pipeline creation
When the business in need of any dashboard, we go back to the source and write a custom connector and creates a pipeline to pull the data. If business changes the requirement you need change your pipeline as well. It requires a lot of effort and time.
4. Maintenance
It’s not only creating the pipeline but monitoring and maintaining as well. We become the owner of the pipeline once we build it. and we need to make sure it runs every single time.
Many more.
So to spare the pain, Enter, Fivetran.
What is Fivetran
Fivetran is a data pipeline. which synchronises the data sources.
Fivetran centralises data from any data source to any data warehouse.
Fivetran High level features
- Fully managed
Fivetran is fully managed and takes care of all the engineering work. where you are only required to work on the business part of the side.
2. fully managed connectors
Fivetran gives 150+ connectors read the data and to synchronises them.
3. no coding
Its all UI. using just a click of a button we can create the connection between a source and a destination.
4. source change tracking
Fivetran takes care of the source schema changes and make sure it applies to the destination.
5. minimal user intervention
Fivetran is self healing and it requires minimal user interventions.
6. Deployed in cloud
Fivetran is fully managed and deployed in the cloud.
How does the Fivetran takes care of Incremental data
If there are any new rows have been added in the source, the same will be applied in the destination as well.
If there is any update on the column, the same will be updated in the destination as well.
In the case of deletion, the destination row will not be deleted but will be marked as true under a new column “fivetran_deleted” as the deleted column might have the historical information needed by the business.
How does the Fivetran takes care of the schema changes
Fivetran will add the column in the destination if it sees a new column added in the source side.
Fivetran will not delete any column in the destination though its been deleted in the source as that column might have a different requirement for the business.
If there is a change in the column type, the same will be applied in the destination as well.
If a new table is added in the source, the same table will be created in the destination.
If the source table is deleted, the destination table WILL NOT be deleted.
Demo
We will be setting up a connection between Google sheets and the Snowflake
Source: Google sheets
Destination: Snowflkae
- Setup fivetran account
2. create the Snowflake account for free
3. Create the connection in fivetran
In the dashboard select google sheets.
Destination table is the table which will be created upon successful sync if not present.
You need to create a range in the google sheet which will be read.
Share the sheet with the email address provided in the UI.
give the Sheet url.
Once you click on Find Sheet you will see the names range what you have created.
Create destination Snowflake connector
You can make use of the setup guide in the right side to create the required user, database and accesses.
The final UI after successful connection.
Click on Initial Sync.
And, in the destination we can see that the data has been added.
Reference:
Ajith Shetty
Bigdata Engineer — Bigdata, Analytics, Cloud and Infrastructure.
Subscribe✉️ ||More blogs📝||Linked In📊||Profile Page📚||Git Repo👓
Interested in getting the weekly newsletter on the big data analytics around the world, do subscribe to my: Weekly Newsletter Just Enough Data