PinnedLet’s do it DBT way!Shift from ETL to ELT Ever since the beginning of Bigdata bubble, we have all been only talking about how we are going to Extract the Data Transform the Data and Load the Data. But the problem here is the data and compute are tightly coupled. …Etl7 min readEtl7 min read
PinnedVery much real time by Apache PINOTWhen we talk about real time metrics, underneath its a batch or specifically its micro batches by which our queries are run and returns the result. As and how the data size increases, the time it takes for processing the data will also differ. Now the need for realtime has…Apache Pinot8 min readApache Pinot8 min read
Published in Analytics Vidhya·PinnedSpark, Parallelising the Parallel JobsWhat on earth does “Parallelising the parallel jobs” mean?? Without going in depth, On a layman term, Spark creates the DAG or the Lineage based on the sequence we have created the RDD, applied transformations and actions. It applies the Catalyst optimiser on the dataframe or dataset to tune your…Spark5 min readSpark5 min read
PinnedPredicate Pushdown is for real??Spark is used as an ingestion tool in more and more companies. It is a perfect replacement for any kind of commercial applications like Talend or Informatica. Spark can connect to multiple different source systems, it could be standard databases like SQL Server, ORACLE or even NOSQL databases like Cassandra…Spark4 min readSpark4 min read
Published in Analytics Vidhya·PinnedWhat’s the buzz about Parquet File format?Parquet is an efficient row columnar file format which supports compression and encoding which makes it even more performant in storage and as well as during reading the data Parquet is a widely used file format in the Hadoop eco system and its widely received by most of the data…Spark6 min readSpark6 min read
Dec 11, 2022Hey there, I’m using Mage!How far have we reached already. The Data engineering is in boom and the Big Data tools are growing ever since. Every day new Tools are introduced and the current tools are adding new features to support all the new use cases we are encountering in the 2022 and beyond. …Mage7 min readMage7 min read
Feb 6, 2022redash, Re-imagine the Dash-boardVisualisation has become the hottest trend in 2022 already and irrespective of the Data segment you live in, you will use Dashboard almost everyday. Data Engineers, Data scientists, Data Analysts or Data Owners, all would need in common is the Dashboard. Dashboard will help you in analysing your data. …Redash5 min readRedash5 min read
Jan 9, 2022Apache Hudi pronounced “hoodie”Data has become as expensive as oil or the gold. now more and more companies are spending millions of dollars on Data and taking Data driven decisions. Keeping the freshness of the data and keeping it up-to date has become more and more important. And at the same time the…Hudi5 min readHudi5 min read
Jan 1, 2022A fresh combination, BI and Metabase2021 is already over and we are in 2022. But the need for the data has never reduced. To be very clear, we are so more dependent on Data and this trend is going to be there for a very long time. Data has answered so many business questions in…Metabase6 min readMetabase6 min read
Dec 12, 2021Know your Data using OpenMetadataWe all are aware of the importance of the Metadata but little we care about it. We tend to spend most of the time bringing in the data or to move it from one place to another. But the real questions are: how well we know this data? How…Openmetadata5 min readOpenmetadata5 min read