News

Experience

 
 
 
 
 

Visiting Researcher

Carnegie Mellon University

Jun 2025 – Present Pittsburgh, Pennsylvania, USA

  • Carrying out my undergrad thesis at Carnegie Mellon University's Robotics Institute under the supervision of Dr. Wenshan Wang and Dr. Ji Zhang
  • Worked on Vision-Language Navigation for Outdoor Scenarios
  • Working on 3D Object Representation Learning

 
 
 
 
 

Research Intern

National University of Singapore

Apr 2024 – Present Remote

  • Research intern at MARMoT Lab, NUS under the supervision of Prof. Guillaume Sartoretti
  • Working on 3D Scene Graph Generation using Vision Language Models
  • Worked on Open-Vocabulary Gaussian Splatting
  • Worked on Next-Best-View Policy for 3D Reconstruction using Neural Radiance Fields (Co-supervised by Prof. Marija Popović)

Publications

UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena, Nishant Raghuvanshi, Neena Goveas
ECMR 2025  (Oral Presentation)
arXiv

UAV-VLN leverages the common-sense reasoning of LLMs and a vision model for cross-modal grounding to plan context-aware aerial trajectories from free-form natural language instructions.

LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Pranav Saxena, Avigyan Bhattacharya, Ji Zhang, Wenshan Wang
Human-aware Embodied AI Workshop @ IROS 2025  
arXiv

LLM-RG is a zero-shot hybrid pipeline that leverages Large Language Models for symbolic reasoning and Vision-Language Models for fine-grained attribute extraction to perform robust referential grounding in challenging outdoor driving scenarios.

Projects

3D Object Representation Learning
Carnegie Mellon University

Creating a generalized 3D encoder to be used in various downstream tasks

Efficient Open Vocabulary 3D Reconstruction using Gaussian Splatting
In Collaboration with NUS, TU Delft

Created a custom autoencoder to generate language features thus eliminating the need of training a per-scene autoencoder. Improved efficiency by 2x compared to LangSplat. Applying to downstream robotics tasks.

3D Scene Graph Generation using VLMs
National University of Singapore

Heirarchal 3D Incremental Scene Graph Generation using Vision Language Models in a zero-shot manner. Later applied to downsteam Vision-Language Navigation task.

3D Human Pose Estimation using WiFi signals
BITS Pilani, Goa
Report

Developed a two-stage learning framework for 3D human pose estimation using WiFi signals, and incorporated phase sanitization to improve CSI quality. Validated on the MM-Fi dataset, achieving enhanced accuracy and robustness over raw CSI inputs.

Swarm Bots
BITS Pilani, Goa
Poster

Developed an algorithm for swarm robots by combining Artificial Potential Field and Reciprocal Velocity Obstacles methods to overcome the shortcomings of both methods.

Stock Price Prediction using Optimal Stopping Theory
BITS Pilani, Goa
Code, Report

This is the course project for MATH F424 Applied Stochastic Process. Developed a stock price prediction model using Optimal Stopping Theory and Geometric Brownian Motion to maximize returns by determining the optimal time to sell stocks. Backtested the model on historical NSE stock data over 10 years.

Artemis
BITS Pilani, Goa
Video

Developed an autonomous drone using ROS2 and MAVROS as part of my competitive autonomous drone team, Team Artemis. Implemented a robust surveying and informative path planning algorithm. Hardware stack includes a Raspberry Pi 5, PX4 flight controller.