Education
-
2022 - 2023
UNIVERSITY AT BUFFALO SUNY
MASTERS IN DATA SCIENCE
-
2013 - 2017
CVR COLLEGE OF ENGINEERING
BACHELORS
Experience
-
2020 - 2022
ACCENTURE – SIEMENS ENERGY
SENIOR DATA ENGINEER
• Understanding the requirement from the architects and proposing designs as per requirements.
• Analyzed the requirements and framed the business logic for the ETL process.
• Evaluated existing OLAP cubes and understood dependent fact tables, dimension tables and derived views. Created star schema data model for the same.
• Extracted data from CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, Snowflake DB and Snowflake.
• Worked on migrating existing on-premises data to data lake (S3, Redshift ) using data ingestion services like Apache NIFI, DMS and Glue.
• Developed PySpark scripts to do data transformations and data movement & control.
• Created AWS Cloudwatch service to monitor data lake.
• Created a data pipeline orchestration using AWS Step functions and explored knowledge on Apache Airflow
• Hands-on experience on data ingestion from Oracle, MSSql, API’s etc., to AWS S3.
• Experience on writing Redshift queries to create data products and performance tuning and optimization like distribution keys, Redshift Spectrum
• Worked on writing Glue jobs for data transformations and data Ingestion.
• Experience on handling secure data using data masking, hashing and tokenization at data lake
• Experience on writing python scripts to handle metadata throughout pipeline and create
Glue Catalog using AWS Lambda.
• Hands on experience on writing AWS lambda to Extend other AWS services with a custom logic and backend services that operate at AWS scale and performance like Glue.
• Created AWS lambda functions which configured it to receive events from AWS S3 bucket. -
2018 - 2020
ACCENTURE - MILLERCOORS
DATA ENGINEER
• Experience in implementing Pyspark jobs to do data transformation using AWS EMR.
• Experience on writing python script to automate spin up and terminate EMR service to run pyspark jobs on demand basis
• Expertized in implementing Spark using Python and Spark SQL for faster testing and processing of data responsible to manage data from different sources
• Created Airflow Scheduling scripts in Python.
• Hands on experience on writing AWS lambda to write custom logics, which enables AWS services like EMR.
• Created AWS lambda functions which configured it to receive events from AWS S3 bucket
• Expertized in implementing a python scripts to create Glue Catalog to enable S3 data to Athena
• Worked on writing Python scripts to parse JSON documents and load the data into the S3.
• Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
• Designed and implemented data loading and aggregation frameworks and jobs that will be able to handle diverse data using Spark.
• Mastered the ability to design and deploy rich visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
• Experience on creating stored procedures with PL/SQL.
• Hands-on experience on SQL query performance tuning and optimization