Education
-
2021 - Present
University at Buffalo
M.Sc in Data Science
-
2015 - 2019
GITAM University
B.Tech in Computer Science and Engineering
Experience
-
2022 - 2022
Digital Alpha Platforms
Data Engineer Intern,
• Handled the developments and data loading of Databricks delta lake live tables from external source systems for the ETL process with multi hop architecture (Bronze, silver, gold) and analyzing the campaign data.
• Performed spark transformations and actions on the data using PySpark Data Frame API for Data transformations in ETL process.
• Optimized the performance of Spark jobs, which has increased the pace of reports generation for campaign data.
• Analyzed and debugged the spark jobs by seeing the DAG and lineage. -
2019 - 2021
TATA Consultancy Services
Data Engineer
• Prior experience in developing data pipelines involving data collection, transformation, and reporting the processed data to Enterprise Data Warehouse.
• Configured the Sqoop jobs to ingest the delta records of sales data from upstream sources CRM to the landing zone in Hadoop Data Lake.
• Developed python scripts for data cleaning & data pre-processing as part of the data extraction and integrated them into pipelines.
• Designed scripts for Hive tables following data warehousing schemas and applied extraction logic on data and storing them in their respective supply Hive tables, as per business requirements of stakeholders. Worked on various splitable file formats like Avro, ORC & Parquet.
• Optimized the Hive tables data loads using Partitioning, Bucketing, Map side joins, which has increased the job performance significantly.
• Automated the talend ETL jobs integrating the Hadoop components for the batch processing of incremental data in HDFS.
• Developed SQL control tables in Azure database, for storing the metadata information of ETL jobs.
• Interpreted the statistics of ETL jobs & Hadoop clusters based on findings in the logs generated and resource utilization stats in the Ambari.
• Worked on Azcopy scripts to place the parquet files from On-Premise to Azure cloud container as a blob file. Configured the context parameters in Lookup and Copy data activities of ADF pipelines, to load the data into ADLS gen2 and Azure Synapse.
• Authored multiple technical documents laying out the key development tasks and resolutions which has helped future changes.
• Worked closely with Business stakeholders in gathering the application requirements on the expected structure of data in reports.
Honors & awards
-
October 23, 2022
Databricks Certified Data Engineer Associate
Earners of the Data Engineer Associate certification have demonstrated an ability to perform basic data engineering tasks using Databricks and its capabilities.
-
Microsoft Certified: Azure Fundamentals
Earners of the Azure Fundamentals certification have demonstrated foundational level knowledge of cloud services and how those services are provided with Microsoft Azure.
-
Microsoft Certified: Azure Data Fundamentals
Earners of the Azure Data Fundamentals certification have demonstrated foundational knowledge of core data concepts and how they are implemented using Microsoft Azure data services.