2021 - 2021
– Implemented and maintained ETL pipelines with Apache Spark, SQL and Python.
– Used Spark API over AWS EMR cluster to perform analytics on data in Hive and run spark jobs.
– Implement process improvements( automation, performance tuning, optimize workflows).
– Build periodic batch jobs to load the logs from intermediate hive tables.
– Developed custom sql scripts and written complex search and filter queries to fetch data and visualized results in tableau.
– Analyzed massive and highly complex HIVE data sets, performing ad hoc analysis and data manipulation.
– Loaded data from HANA database to Hive and Optimized Hive SQL and spark jobs in python.
– Involved in creating Hive Tables, writing Hive queries, to run Map Reduce jobs and working with AWS using EMR and EC2 for data processing.
(Skills: Python, SQL/SparkSQL, PySpark, Hive, AWS S3, EC2, EMR, RedShift, AWS Glue, Kubernetes, Docker, Tableau)
2019 - 2020
– Implemented a new Payment Utility system in SAP S/4 HANA platform and worked on different modules of REST APIs, OData services and extensively on HANA database with AMDPs.
– Used Pyspark and Sqoop to extract data from different databases like Hive, Amazon S3, Snowflake etc.
– Transformed the data using various dataframe, rdd and dataset operations along with spark SQL.
– Tested different ways to check code efficiency in terms of speed, memory and data shuffle between partitions.
– Developed end-to-end automation framework in S/4 HANA to onboard several Agencies and Departments for Services Australia and reduced processing time from 48 hours(time to onboard an agency or department manually) to 5 minutes.
– Built a service in Machine learning to process unstructured input text and detecting anomalies in payment data with SAP AI Core and Predictive analysis by Random Decision Trees.
(Skills: Python, Apache Spark, Hive, AWS, Restful APIs, HANA, SQL, Hadoop, Snowflake, AWS Glue, Sqoop, PySpark, Jira)
2019 - 2019
– Involved in complete life cycle of design, development, testing, and deployment of a product.
– Generated Python scripts to automate data sampling process and analyzed existing application programs to tune SQL queries with execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
– Created Restful Web Services to interact with database layer and frontend.