hi , i am
Saumya
Gupta .

AI Research Scientist

download resume

about me

I am Saumya Gupta, an AI researcher with a passion for harnessing cutting-edge technology to solve complex problems. With a Master's degree in Artificial Intelligence from Northeastern University and extensive experience in AI research and development, I specialize in domains such as Natural Language Processing and generative AI.

I am an AI Researcher at Northeastern University’s Institute for Experiential AI, developing biologically-informed Transformer models for RNA splicing prediction. My work combines biological priors with advanced architectures such as Hyena Layers, achieving state-of-the-art performance and enabling generalization across complex genomic tasks. I also innovate in Flow Matching and Re-Basin Canonicalization for neural weight generation, improving model alignment and achieving faster convergence than state-of-the-art baselines.
Previously at Razorpay and Rebel Foods, I built backend APIs and deployed generative AI systems at scale. My AI journey began with a sign language detection system, inspiring a lifelong passion for human-centered, transformative AI.
Curious and collaborative, I continue to advance LLMs, RAG systems, geometric deep learning, and Generative AI models while mentoring and delivering impactful AI solutions.

email

gupta.saumy@
northeastern.edu

Linkedin

saumya-gupta-ai

education

2023 - 2025

master of Artificial Intelligence

northeastern university

Boston, massachusetts, USA

GPA: 4.0
Subjects Taken: Algorithms, Natural Language Processing , Foundations of AI , Machine Learning, AI for Human Computer Interaction , Machine Learning, Programming Design Paradigms, Machine Learning Operations

2017 - 2021

bachelor of computer science

vellore Institute of technology

Vellore, Tamil Nadu , India

GPA: 3.8
Subjects Taken: Algorithms, Artificial Intelligence, Natural Language Processing, Machine Learning, Computer Vision, Statistics, Calculus, Data Visualization, Databases, OOPs, Graph theory, Linear Algebra, Operating Systems, Distributed Systems

skills

Python Logo
Python
Python Logo
Python
Pytorch Logo
Pytorch
Pytorch Logo
Pytorch
TensorFlow Logo
Tensorflow
TensorFlow Logo
Tensorflow
Golang Logo
Golang
Golang Logo
Golang
Java Logo
Java
Java Logo
Java
Node.js Logo
NodeJS
Node.js Logo
NodeJS
JavaScript Logo
JavaScript
JavaScript Logo
JavaScript
C++ Logo
C++
C++ Logo
C++
Linux Shell Script Logo
Shell Script
Linux Shell Script Logo
Linux Shell Script
PostgreSQL Logo
PostgreSQL
PostgreSQL Logo
PostgreSQL
Neo4j Logo
Neo4j
Neo4j Logo
Neo4j
VectorDB Logo
VectorDB
VectorDB Logo
VectorDB
MongoDB Logo
MongoDB
MongoDB Logo
MongoDB
Docker Logo
Docker
Docker Logo
Docker
Kubernetes Logo
Kubernetes
Kubernetes Logo
Kubernetes
Apache Spark Logo
Apache Spark
Apache Spark Logo
Apache Spark
Apache Airflow Logo
Apache Airflow
Apache Airflow Logo
Apache Airflow
Hadoop Logo
Hadoop
Hadoop Logo
Hadoop
Azure Logo
Microsoft Azure
Azure Logo
Microsoft Azure
AWS Logo
AWS
AWS Logo
AWS
Machine Learning Logo
Machine Learning
Machine Learning Logo
Machine Learning
Deep Learning Logo
Deep Learning
Deep Learning Logo
Deep Learning
NLP Logo
Natural Language Processing
NLP Logo
Natural Language Processing
Computer Vision Logo
Computer Vision
Computer Vision Logo
Computer Vision
Data Analysis Logo
Large Language Models
Data Analysis Logo
Large Language Models
RAG Logo
Retrieval-Augmented Generation
Data Analysis Logo
Retrieval-Augmented Generation
dsa Logo
Data Structures and Algorithms
dsa Logo
Data Structures and Algorithms
Data Analysis Logo
Data Visualization
Data Analysis Logo
Data Visualization
Statistics Logo
Statistics
Statistics Logo
Statistics
Calculus Logo
Calculus
Calculus Logo
Calculus
OS Logo
Operating Systems
OS Logo
Operating Systems
DBMS Logo
Database Mangement Systems
DBMS Logo
Database Mangement Systems
distributed system Logo
Distributed Systems
istributed system Logo
Distributed Systems
devops Logo
Devops
devops Logo
Devops
mlops Logo
MLOps
mlops Logo
MLOps
web dev Logo
Full Stack Software Engineering
web dev Logo
Full Stack Software Engineering

experience

  • July 2024 - Present

    Research Associate , AI Research Coop

    Institute of Experiential AI at Northeastern University

    Boston, USA

    As an AI Researcher at the Institute for Experiential AI, Northeastern University, I develop biologically-informed Transformers to predict alternative splicing in long pre-mRNA sequences, integrating biological priors and advanced architectures such as Hyena Layers to outperform state-of-the-art models and enable zero and few-shot generalization. I leverage multi-GPU distributed training and advanced optimization techniques, including Flash Attention, for faster convergence. I adapt self-supervised embeddings to ADAR editing site prediction, demonstrating robust transferability and enhanced biological interpretability. I also pioneer Flow Matching and Re-Basin Canonicalization for large-scale neural weight generation, resolving permutation symmetries and scaling models to 11M parameters with 80% faster convergence than diffusion baselines. Additionally, I apply geometric deep learning to refine flow models, optimizing symmetric latent spaces for uncertainty quantification and reliable AI predictions.

  • January - April 2024

    Khoury Graduate Teaching Assistant

    Khoury College of Computer Sciences, Northeastern University

    Boston, USA

    I guide students in Foundations of AI through interactive tutorials, clarifying concepts, and fostering their learning journey. By assisting professors with course development, grading, and creating a collaborative environment, I promote teamwork and inclusivity to ensure students excel in AI fundamentals.

    January - April 2024

    Graduate Teaching Assistant

  • April 2022 - August 2023

    Full Stack ML Engineer

    Razorpay

    Bangalore, India

    I led Golang development for the Optimizer - Payments Team, delivering customer-focused solutions to streamline payment operations. I enhanced payment gateway configurations through rule-based prioritization, containerized microservices with Docker, and deployed them on Kubernetes using Helm. I also implemented robust integration tests, ensuring platform reliability. Notably, I introduced a one-click Paytm wallet feature, boosting user engagement, and mentored an intern to develop a charge-back prediction model using an ANN, achieving impactful results.

  • July 2021 - April 2022

    Software Development Engineer

    Rebel Foods (Formely Faasos)

    Bangalore, India

    I led back-end development for the In-order team, managing critical microservices for kitchen staff, food orders, and inventory. I designed scalable schemas for storing order and staff data and implemented a key feature enabling validated order cancellations, reducing revenue loss and ensuring system integrity.

    July 2021 - April 2022

    Software Development Engineer

  • May 2020 - July 2020

    AI Research Intern

    Indian Institute Of Information Technology

    Prayagraj, India

    I conducted a comparative study on GANs for text-to-image synthesis using the Caltech bird dataset, devising a novel evaluation method with t-SNE to analyze overlaps between original and generated images. This research provided nuanced insights into GAN performance and advanced methodologies in AI-driven image synthesis.

  • April 2020 - May 2020

    Computer Vision Intern

    Ocean Energy

    Mumbai, India

    I developed a system for detecting traffic rule violations from video inputs, identifying vehicles breaching traffic signals, extracting license plate data, and converting it to text. The system integrated with a database for organized information storage, enhancing traffic monitoring and enforcement.

    April 2020 - May 2020

    Computer Vision Intern

conferences and publications

POSTER July 2025

Biologically-informed Transformers for RNA Splicing Prediction

Intelligent Systems for Molecular Biology (ISMB) 2025

Manuscript in preparation for publication

Liverpool, UK International Conference
INVITED TALK Sept 2025

Biologically-informed Transformers for RNA Splicing Prediction

Mid-Atlantic Splicing Conference

45-minute invited talk jointly organized by Northeastern University, UVA, UNC Chapel Hill, and Harvard Medical School

Virginia, USA Chief Organiser: Dr. Peter Castaldi
POSTER March 2025

Flow Matching and Re-Basin Canonicalization

Boston Symmetry Day 2025

Submitted for publication and currently under review

Boston, USA Chief Organiser: Dr. Robin Walters

projects

Project 2 Image

Vision Transformer for Lung Cancer Detection

The Lung Cancer Detection from Chest CT Scans project applies a Vision Transformer (ViT) to classify CT images into Normal, Adenocarcinoma, Large Cell Carcinoma, and Squamous Cell Carcinoma. High-resolution 3D scans are preprocessed and augmented to improve model robustness, while the transformer architecture leverages attention mechanisms to capture subtle spatial patterns often missed by convolutional networks. Evaluated using clinically relevant metrics, the model achieves an overall accuracy of 88% and strong F1-scores across cancer subtypes, effectively distinguishing normal tissue from tumors and supporting early-stage lung cancer diagnosis.
View Project

Project 1 Image

DeclareR - Enhancing Human Interaction and Interpretability in Reinforcement Learning

DeclareR is a project developed to enhance Reinforcement Learning (RL) by integrating human knowledge and improving model interpretability. It uses RLang, a declarative language, to encode partial world knowledge into RL environments. By translating natural language instructions via LLMs (Ollama), it enables human suggestions to guide RL agents before training. It demonstrates improved policy learning and transparency in classic RL environments like Taxi, Cliff Walking, and Frozen Lake. For interpretability, it uses surrogate models (e.g., decision trees) and tools like SHAP, LIME, and LLM-generated rationales to provide both global and local insights into RL agent's decisions. DeclareR bridges the gap between black-box RL agents and human-understandable decision-making.

View Project
Project 1 Image

Scalable and Interpretable Food Domain RAG

This project demonstrates a scalable and explainable Retrieval-Augmented Generation (RAG) system for answering queries related to food, ingredients, and restaurants. It integrates data from web scraping and CSV files, generating intelligent indexes for fast and accurate responses. The system utilizes a cost-effective BGE model for embedding generation, ensuring scalability while minimizing costs. Key features include explainable document selection with citations, Chain-of-Thought reasoning for logical answers, and polite, context-aware responses. It also maintains structured outputs with clear references.

View Project
Project 1 Image

Shroom: Hallucination Detection in LLMs

This project tackles the critical challenge of detecting hallucinated outputs in LLMs, which are factually inaccurate despite being grammatically correct. Focusing on tasks like definition modeling, paraphrasing, and machine translation, I used clustering with task-specific metrics and trained a Siamese network with BERT embeddings to evaluate LLM performance. Additionally, I experimented with prompting techniques using the LLaMA model. This comprehensive approach improves the reliability and trustworthiness of AI-generated text across domains.

View Project
Project 2 Image

Ayurveda RAG: Knowledge Graph RAG

The Ayurveda RAG project combines the ancient wisdom of Ayurveda with the power of modern AI to create an intelligent retrieval-augmented generation system. This system is designed to answer complex Ayurveda-related questions, even those containing Hindi or Sanskrit terms, with accuracy and depth. By leveraging a Neo4j-powered knowledge graph and a FAISS index for efficient retrieval, the project ensures precise, context-aware responses. This innovation bridges traditional healthcare knowledge with cutting-edge AI, making Ayurvedic wisdom more accessible and actionable for a global audience.

View Project
Project 2 Image

Semantic Segmentation Of Images : Autonomous Driving Vehicles

Performed semantic segmentation on the Indian driving dataset using deep convolutional networks - FCN8, UNET, LINKNET, PSPNET, and DEEPLABV3+. Conducted a thorough performance analysis with mean Intersection over Union (IOU) score and mean F1 score as metrics on validation dataset, showcasing superior results of IOU of 0.77 and Mean F1 as 0.78 for DEEPLABV3+. Remarkable outcomes were also observed for DeeplabV3+ and Linknet architectures.

View Project
Project 2 Image

Picto Phrases: Image Caption Generator

I utilized InceptionNetV3 for generating image features in combination with Bidirectional LSTM for image captioning on the Flickr8k dataset. Employing beam search with factors 3 and 5, I further explored predicted captions to enhance the results. The generated captions successfully captured the primary context of the images, achieving a CORPUS – BLEU score of 0.435339.

View Project
Project 2 Image

Sarcasm Detection on News Headlines

I conducted a performance comparison among Decision Tree, SVM, Random Forest, and a basic LSTM model for sarcasm detection using the Kaggle News Headlines Dataset. The accuracy results revealed that the LSTM model achieved 90%, Random Forest achieved 89%, and SVM achieved 86%. This exploration provided insights into the working mechanisms of these diverse models and their efficiency in comprehending the relationships between words in a sequence to identify sarcasm.

View Project
Project 2 Image

Online Job Search Platform

Developed a website featuring registration for Candidates and Recruiters. Candidates can upload resumes, parsed for skills, education, and experiences, stored in the database. Recruiters can log in, post jobs, and view applicants sorted by matching skills. Candidates can filter jobs and view applied jobs and registered companies. Technologies used: HTML, CSS, Bootstrap, JavaScript, jQuery, MongoDB, Node.js. Adhered to solid principles.

View Project

contact me

gupta.saumy@northeastern.edu