• Are you open to working remotely? Yes
  • Please Enter All Locations You Are Open to Work (each line will be each city, country location)
    United States
  • Allow Profile Promotion To Recruiters/Companies Yes
  • Allow Profile Promotion on Social Media Yes
  • Allow Profile Promotion to Alumni Yes
  • Viewed 30

About me



  • 2023 - 2024

    Graduate Research Assistant

    • Developed Medical reports Summarizer by fine-tuning the Mistral-7B model on AWS Cloud using Low Rank Adaptation (LoRA) on the MIMIC-III dataset, leading to improved BERTScore by 15%
    • Collaborated with team to develop and evaluate a Generative AI model built using VQ- VAE, Diffusion model and MIMIC-CXR dataset, to generate text-conditioned high quality chest X-ray images.
    • Devised a Conversational System component, integrated with humanoid robot to assist new students, employing advanced Retrieval Augmented Generation (RAG), Whisper TTS and Vector Databases (FAISS, Pinecone, Milvus) using Langchain framework and the Llama2 LLM on university FAQ info.

  • 2023 - 2024

    NLP Intern

    • Developed hybrid Information Retrieval component for Standard Insurance client, enhancing attribute extraction from Insurance schedule documents by 40%, utilizing keywords and advanced RAG techniques with Langchain, Llama-2, GPT 3.5 models on Microsoft Azure.
    • Leveraged Cross Encoder Reranking, Reciprocal Rank Fusion (RRF), HyDE, Contextual Compression techniques to improve the RAG Triad metrics by 25%

  • 2021 - 2023

    Senior Data Scientist

    • Led development and deployment of Clinical Language Processing Engine components on AWS Cloud with Nashbio and NCCS Oncology data, utilizing Spark NLP and Pyspark in Python to extract essential clinical data from text reports, in order to assist clinicians to quickly understand extensive patient histories.
    • Developed Data ingestion pipelines to clean, validate and transform the clinical data using Pyspark and managed the data using DVC and Git.
    • Built and fine-tuned Named Entity Recognition models with BiLSTM and Clinical BERT on annotated 9000 Oncological clinical reports, achieving a 90% F1 score across 25 entities.
    • Performed iterative Text mining on annotated clinical datasets, and subsequently built models with BiLSTM architecture for Relation and Assertion Extraction, achieving an accuracy rate of 85%.
    • Deployed the models in production environment using MLFlow, Tensorflow Serving on AWS Cloud
    • Implemented MLOps practices with MLFlow, DVC and CI/CD pipelines for the NLP engine, enhancing model management efficiency by 40% and ensuring logging, robust experiment tracking, and drift measurement.
    • Built the dashboards using Prometheus and Grafana for Model monitoring and observability
    • Ensured code quality through code reviews, deployed engine with Docker, and optimized model response times while addressing security vulnerabilities.
    • Collaborated with domain and testing teams to improve business logic and quality of the Engine pipeline.

  • 2017 - 2021

    Data Scientist

    • Led R&D of AI Trustworthy components including Robustness (Adversarial Attacks & Defenses), Explainability, Calibration for assessing Non-Functional Requirements of production ready AI models (CV & NLP).
    • Spearheaded application of Explainability techniques (XRAI, LIME, SHAP, Integrated Gradients, GradCAM) in Medical Image Classification and NLP projects, formulating novel metrics for precise explanation quality assessment.

    • Enhanced Patent Classification application accuracy to 85% by fine-tuning BERT model on 2000 drafts, significantly improving intellectual property management.
    • Evaluated the Robustness of pre-trained image classification and fine-tuned medical imaging models against adversarial attacks such as HopSkipJump, Adversarial Patch, FGSM etc.
    • Collaborated with team to develop and deploy based Time Series Forecasting Engine with ARIMA, SES, DES, ETS, Prophet, and LSTM-based methods to optimize manufacturing process estimation.
    • Devised a Keyword Extraction tool through application of YAKE, RAKE, and Noun phrases techniques, integrating it as a search functionality component within Django for the Digital Resources Portal, encompassing 1000 documents.

  • 2015 - 2017

    Data Science Developer

    • Designed Copyright Assessment Tool for an internal IP team utilizing Semantic Similarity and web scraping techniques to assess plagiarism levels in the Patent drafts. Used PySpark environment to optimize model performance on a large corpus of input data.
    • Implemented a KYC application to extract profile and tabular information from scanned vendor agreement documents and PDF files employing OCR techniques with 87% accuracy.
    • Derived insights from claims data and automated the Claims Classifier System for Human Resource Team using XGBoost with nearly 82% accuracy to categorize employee claims.