PinnedHey there, I’m using Mage!How far have we reached already. The Data engineering is in boom and the Big Data tools are growing ever since. Every day new Tools are introduced and the current tools are adding new features to support all the new use cases we are encountering in the 2022 and beyond. …Mage7 min readMage7 min read
Published inSelectFrom·PinnedODD Platform: Modern Data SolutionThe most under rated topic in the Data world is the Data Authenticity. We have been asked about the source of the data, how clean is it, where there any deviation from the source to the destination, how clean is it, who is the owner. But these questions are left…Open Data Discovery5 min readOpen Data Discovery5 min read
PinnedKestra: An Extra powerful OrchestratorAutomation is the key term in every part of the Software development and Engineering. The automation is moving at a very fast pace. And more and more companies are looking for a single place to incorporate all their requirements. The common requirement from every organization is Orchestration and Scheduling. Building…Kestra6 min readKestra6 min read
PinnedLet’s do it DBT way!Shift from ETL to ELT Ever since the beginning of Bigdata bubble, we have all been only talking about how we are going to Extract the Data Transform the Data and Load the Data. But the problem here is the data and compute are tightly coupled. …Etl7 min readEtl7 min read
Published inAnalytics Vidhya·PinnedWhat’s the buzz about Parquet File format?Parquet is an efficient row columnar file format which supports compression and encoding which makes it even more performant in storage and as well as during reading the data Parquet is a widely used file format in the Hadoop eco system and its widely received by most of the data…Spark6 min readSpark6 min read
Sep 23Streaming Swiftly: Exploring Fast API and Kafka IntegrationEmpowering Data Engineers: Harnessing the power of Kafka and Fast API for Seamless Data Processing. Given the full stack engineers who are well versed with Front end and backend application knowledge, Data Engineers are also expected to know the in depth knowledge from End to End pipeline which consists of…Kafka4 min readKafka4 min read
Published inData Engineer Things·Aug 12Integrating Rust for High-Performance Data Processing in Delta Lake: A Technical ExplorationUtilizing Delta Lake format outside of the JVM ecosystem with Rust. — Background DELTA LAKE an open source storage layer on top of Parquet File format. The adoption of the Delta lake in the Data engineering world is ever increasing. The main reason being it extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling. Rust is…Software Development5 min readSoftware Development5 min read
Published inAnalytics Vidhya·Aug 5Streams Everywhere — Season 1 Episode 1Kafka is the essential tool for almost all the data driven companies. The companies are connecting more and more sources in their platform and want to stream the real time data for N number of use cases. Use cases like weather reporting, sensor status, real time tracking and many more. …Kafka7 min readKafka7 min read
Jun 18Cube, The Semantic LayerData democratization is the term which is making the buzz all around the Business. Data is never restricted to a subset of users or groups like Data Engineers, Data Analysts or Data scientists. According to Collibra Data democratization is when an organization makes data accessible to all employees and stakeholders, and educates them…Data Democratization6 min readData Democratization6 min read
Apr 26Promptimize: Step towards the FutureThe whole world has taken aback when the chatgpt was launched. And with that so many new possibilities have been unlocked. There are new use cases and new innovations which grew in every part of the organization. The uses cases are many for AI and chatgpt. And every Data Driven…Promptimize6 min readPromptimize6 min read