[MS Data Science · University of Houston]

Vamshi Krishna Janagama Data Scientist

Houston, TX ML · LLMs · Data Engineering ● Open to opportunities

Building AI-powered data systems — from distributed ETL pipelines to RAG infrastructure and predictive ML models. I turn messy data into decisions that scale.

↓ View Projects Get in touch

Experience

Where I've worked.

Graduate Data Scientist
Aug 2024 – Present
University of Houston – Athletics & Academic Services · Houston, TX
  • Built Python & SQL ETL pipelines integrating 5+ data sources (100K+ records) for analytics and operational reporting.
  • Designed SQL data models improving query performance by 35% and enabling scalable analytics workflows.
  • Developed Power BI dashboards and automated reporting pipelines for leadership KPI monitoring.
  • Automated preprocessing & feature engineering pipelines, cutting manual data prep time by 30%.
  • Implemented FAISS-based semantic search for improved information retrieval across institutional datasets.
Data Scientist
Apr 2023 – Apr 2024
Key Care Drugs Pvt Limited · Hyderabad, India
  • Built Python & SQL data pipelines with automated validation, reducing data quality issues by 90%.
  • Developed Scikit-learn predictive ML models improving forecasting accuracy by 18% over rule-based baselines.
  • Conducted sales & operational analytics identifying revenue drivers to support pricing and budgeting strategies.
  • Implemented MLflow experiment tracking for model reproducibility and performance monitoring.

Projects

What I've built.

01 · AI Systems
AI Retrieval Pipeline (RAG Infrastructure)
May 2025 – Dec 2025

End-to-end RAG pipeline with document ingestion, embedding generation, and FAISS vector indexing integrated with LLMs for enterprise semantic search.

Retrieval accuracy +22% Irrelevant results reduced by ~60%
RAGFAISSLangChainLLMsVector DBs
02 · Data Engineering
Scalable Distributed Data Engineering Pipeline
Jul 2024 – Oct 2024

Distributed ETL pipelines using PySpark processing 1M+ records. Ensemble ML models for risk prediction with full model explainability and drift monitoring.

AUC-ROC 0.91 on risk prediction SHAP + drift detection monitoring
PySparkXGBoostLightGBMSHAP
03 · ML Engineering
Multi-Modal Risk Stratification & Anomaly Detection
Nov 2024 – Jan 2025

Re-engineered Pandas workflow to distributed PySpark architecture. Automated anomaly detection and data quality monitoring dashboards for scalable analytical modeling.

Processing speed 4× faster Anomaly detection rate 95%+
PySparkAnomaly DetectionPower BI

Skills

My tech stack.

Programming & Analysis
PythonSQLPandasNumPyPySpark
Machine Learning
Scikit-learnXGBoostLightGBMPyTorchMLflow
AI & LLM Systems
RAGLangChainFAISSVector DBsLLM Eval
Data Engineering
ETL PipelinesApache SparkPostgreSQLData Modeling
Visualization & BI
Power BITableauKPI DashboardsEDA
Cloud & DevOps
AWS (EC2/S3/RDS)RedshiftGitJupyterSHAP

Background

Education & Research.

MS Degree
M.S. Engineering Data Science
University of Houston · Houston, TX
Aug 2024 – May 2026
B.Tech Degree
B.Tech Computer Science (AI)
Amrita Vishwa Vidyapeetham · Amritapuri, India
Jul 2020 – May 2024
Publication · IEEE Xplore 2024
Spatial Analysis-Enhanced Dermatological Image Classification for Paronychia
IEEE Xplore · 2024

Deep learning pipeline combining U-Net segmentation and spatial analysis on DermNet dataset. Evaluated ResNet34, VGG16, DenseNet121, InceptionV3, EfficientNet with a confidence scoring framework.

97.1% classification accuracy

Contact

Let's connect.

I'm actively looking for full-time Data Scientist and ML Engineer roles. Whether you have a project, a position, or just want to talk data — I'm always down to connect.

✉ Send an email