Hi! I'm Pranav, a senior in ECE at BITS Pilani, Goa, India. I'm interested in 3D Computer Vision and 3D Scene Understanding. Currently, I'm at Carnegie Mellon University's Robotics Institute pursuing my undergraduate thesis under the supervision of Dr. Wenshan Wang at AirLab and Dr. Ji Zhang. Here, I have worked on Vision Language Navigation in Outdoor Scenarios and I am currently working on 3D Object Representation Learning.
Since my sophomore year, I have also been remotely working at MARMoT Lab, NUS under the supervision of Prof. Guillaume Sartoretti. Here, I have worked on 3D Scene Graph Generation using VLMs, Next-Best-View Policy for NeRFs (co-supervised by Prof. Marija Popović), Open-Vocabulary Gaussian Splatting.
At my university, I am grateful to have worked with Prof. Neena Goveas, Prof. Naveen Gupta, Prof. Sarang Dhongdi, and Prof. Tanmay Tulsidas Verlekar.
Beyond academics, I enjoy watching series, movies, travelling and playing sports. I also love tinkering with drones, and I founded an autonomous drone team in my sophomore year ;)
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena, Nishant Raghuvanshi, Neena Goveas
ECMR 2025  (Oral Presentation)
arXiv
UAV-VLN leverages the common-sense reasoning of LLMs and a vision model for cross-modal grounding to plan context-aware aerial trajectories from free-form natural language instructions.
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Pranav Saxena, Avigyan Bhattacharya, Ji Zhang, Wenshan Wang
Human-aware Embodied AI Workshop @ IROS 2025  
arXiv
LLM-RG is a zero-shot hybrid pipeline that leverages Large Language Models for symbolic reasoning and Vision-Language Models for fine-grained attribute extraction to perform robust referential grounding in challenging outdoor driving scenarios.
3D Object Representation Learning
Carnegie Mellon University
Creating a generalized 3D encoder to be used in various downstream tasks
Efficient Open Vocabulary 3D Reconstruction using Gaussian Splatting
In Collaboration with NUS, TU Delft
Created a custom autoencoder to generate language features thus eliminating the need of training a per-scene autoencoder. Improved efficiency by 2x compared to LangSplat. Applying to downstream robotics tasks.
3D Scene Graph Generation using VLMs
National University of Singapore
Heirarchal 3D Incremental Scene Graph Generation using Vision Language Models in a zero-shot manner. Later applied to downsteam Vision-Language Navigation task.
3D Human Pose Estimation using WiFi signals
BITS Pilani, Goa
Report
Developed a two-stage learning framework for 3D human pose estimation using WiFi signals, and incorporated phase sanitization to improve CSI quality. Validated on the MM-Fi dataset, achieving enhanced accuracy and robustness over raw CSI inputs.
Swarm Bots
BITS Pilani, Goa
Poster
Developed an algorithm for swarm robots by combining Artificial Potential Field and Reciprocal Velocity Obstacles methods to overcome the shortcomings of both methods.
Stock Price Prediction using Optimal Stopping Theory
BITS Pilani, Goa
Code, Report
This is the course project for MATH F424 Applied Stochastic Process. Developed a stock price prediction model using Optimal Stopping Theory and Geometric Brownian Motion to maximize returns by determining the optimal time to sell stocks. Backtested the model on historical NSE stock data over 10 years.