News

Experience

 
 
 
 
 

Visiting Researcher

Carnegie Mellon University

Jun 2025 – Present Pittsburgh, Pennsylvania, USA

  • Pursuing my undergraduate thesis at Carnegie Mellon University’s Robotics Institute under the supervision of Dr. Wenshan Wang and Dr. Ji Zhang
  • Working on 3D Object Representation Learning
  • Previously worked on Vision-Language Navigation for Outdoor Scenarios and co-organized the CMU-VLA Challenge

 
 
 
 
 

Research Intern

National University of Singapore

Apr 2024 – Present Remote

  • Research intern at MARMoT Lab, NUS under the supervision of Prof. Guillaume Sartoretti
  • Worked on 3D Scene Graph Generation using Vision-Language Models
  • Designed a generalized framework for Open-Vocabulary 3D Gaussian Splatting, eliminating the need for a per-scene autoencoder
  • Developed a Next-Best-View Policy for 3D Reconstruction using Neural Radiance Fields (co-supervised by Prof. Marija Popović)

Publications

UAV_VLN

UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena, Nishant Raghuvanshi, Neena Goveas
ECMR 2025  (Oral Presentation)
arXiv

UAV-VLN leverages the common-sense reasoning of LLMs and a vision model for cross-modal grounding to plan context-aware aerial trajectories from free-form natural language instructions.

LLM-RG

LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Pranav Saxena, Avigyan Bhattacharya, Ji Zhang, Wenshan Wang
Human-aware Embodied AI Workshop @ IROS 2025  
arXiv, Code

A zero-shot hybrid pipeline that leverages Large Language Models for symbolic reasoning and Vision-Language Models for fine-grained attribute extraction to perform robust referential grounding in challenging outdoor driving scenarios.

3DSG

ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models
Pranav Saxena, Jimmy Chiun
Preprint  
arXiv

Zero-shot framework for incremental 3D scene graph generation using vision-language models, enabling open-vocabulary reasoning and geometric grounding for embodied environments.

Gen-LangSplat

Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression
Pranav Saxena
Preprint  
arXiv, Code

A generalized 3D Gaussian Splatting framework that eliminates per-scene training and improves cross-scene generalization via a ScanNet-trained autoencoder for CLIP feature compression. This work was done in collaboration with NUS and TU Delft.

Other Projects

3D Representation

3D Object Representation Learning
Carnegie Mellon University

Creating a generalized 3D encoder that can be used to represent 3D objects. The learned embeddings could be used for various downstream tasks like 3D Object Segmentation, 3D Shape Completion, 3D Captioning.

NBV-NeRF

Next-Best-View Policy for NeRFs
National University of Singapore, TU Delft, Purdue University
Code

Developed an attention-based Next-Best-View (NBV) policy for Neural Radiance Fields to select the most informative viewpoints for 3D reconstruction. Used a Vision Transformer to encode multi-view image embeddings and candidate poses to predict ΔPSNR gain, supervised with ground-truth from pixelNeRF.

wifi-Perception

3D Human Pose Estimation using WiFi signals
BITS Pilani, Goa
Report

Developed a two-stage learning framework for 3D human pose estimation using WiFi signals, and incorporated phase sanitization to improve CSI quality. Validated on the MM-Fi dataset, achieving enhanced accuracy and robustness over raw CSI inputs.

Artemis

Artemis
BITS Pilani, Goa
Video

Developed an autonomous drone using ROS2 and MAVROS as part of my competitive autonomous drone team, Team Artemis. Implemented a robust surveying and informative path planning algorithm. Hardware stack includes a Raspberry Pi 5, PX4 flight controller.

Swarm Bots

Swarm Bots
BITS Pilani, Goa
Poster

Developed an algorithm for swarm robots by combining Artificial Potential Field and Reciprocal Velocity Obstacles methods to overcome the shortcomings of both methods.

ASP

Stock Price Prediction using Optimal Stopping Theory
BITS Pilani, Goa
Code, Report

This is the course project for MATH F424 Applied Stochastic Process. Developed a stock price prediction model using Optimal Stopping Theory and Geometric Brownian Motion to maximize returns by determining the optimal time to sell stocks. Backtested the model on historical NSE stock data over 10 years.