Open to full-time AI / ML Engineer roles · San Francisco, CA

Yajurved
Jayavarapu

I build

AI Engineer with 3+ years of experience designing and shipping production LLM systems, RAG pipelines, and agentic AI workflows. From data ingestion and feature engineering to LLM evaluation and cloud-native MLOps — I turn complex AI research into reliable, scalable systems.

Yajurved Jayavarapu
3+
Years of Experience

Building AI That Ships

I'm Yajurved Jayavarapu, an AI Engineer based in San Francisco, CA with a Master of Science in Data Science from the University of Alabama at Birmingham. I specialise in LLM systems, RAG architectures, and agentic AI — building production-grade pipelines that go far beyond notebooks.

My work spans Syneos Health, UAB, and Thomson Reuters — where I architected Spark-based ML pipelines, reduced data latency by 35%, cut manual reporting time by 60%, and deployed NLP systems at scale. I hold deep expertise in LangChain, LlamaIndex, LangGraph, vector databases, and AWS cloud-native ML platforms.

3+
Years Experience
3
Companies
MS
Data Science
🏢
Data Scientist · Syneos Health
Jan 2025 – Present · USA
End-to-end ML pipelines · Spark batch processing · Airflow DAGs on AWS · 35% latency reduction
🎓
Data Scientist · University of Alabama at Birmingham
Aug 2023 – Apr 2024 · AL, USA
Real-time sensor pipelines · Anomaly detection · 60% reduction in manual reporting
⚖️
Data Scientist · Thomson Reuters
Aug 2021 – Apr 2023 · India
Spark ML on Hadoop · NLP pipelines · Scikit-learn production workflows

My Skill Set

From LLM orchestration frameworks to distributed data engineering — the full stack of modern AI.

🤖
AI / LLM Systems
LangChain / LangGraph93%
RAG & Agentic Pipelines95%
LlamaIndex88%
OpenAI / Anthropic APIs92%
🧠
Machine Learning
Python97%
Scikit-learn / Classical ML90%
Deep Learning (CNN, RNN)82%
NLP & Time Series88%
⚙️
Data Engineering & MLOps
Apache Spark Airflow ETL Pipelines SQL / PostgreSQL AWS S3 / EC2 / EMR AWS Redshift LangSmith ChromaDB FAISS Vector DBs
📊
Tools & Visualization
Power BI Tableau Git JIRA Jupyter R Prompt Engineering Function Calling JSON Schemas

Featured Projects

Production-grade AI systems — from grounded RAG to multi-agent verification.

🔍
LLMs · RAG · LangChain

Grounded RAG Assistant

Built a grounded RAG system that enforces document-only answers and safe refusals to eliminate hallucinations. Features agent-based query routing, hybrid retrieval (BM25 + dense vectors), metadata-aware filtering, chunk-level traceability, and integration tests for end-to-end LLM validation.

Python LangChain ChromaDB BM25 OpenAI LangSmith
Health Analytics · Data Engineering

Apple Watch Health Analytics Dashboard

End-to-end health data analytics pipeline ingesting Apple Watch metrics, applying time-series analysis and anomaly detection, and surfacing insights through an interactive analytics dashboard. Demonstrated 40% improvement in data reliability metrics.

Python Pandas Time Series Anomaly Detection Dashboard
🤖
Agentic AI · Multi-Agent · RAG

Verified Agentic RAG System

Multi-agent RAG system with dedicated Planner, Retriever, Answer, and Verifier agents. Implements query decomposition, multi-hop retrieval, citation enforcement, and retry logic using modular state machines and structured JSON outputs.

LangGraph LlamaIndex OpenAI JSON Schemas Python
MLOps · Spark · NLP

Production NLP & ML Pipelines

Built Spark-based ML pipelines on Hadoop clusters for large-scale text classification and sentiment analysis at Thomson Reuters. Applied cross-validation and robust metrics (AUC, F1, Precision, Recall) with reproducible training and evaluation workflows.

Apache Spark Scikit-learn NLP Hadoop Python
☁️
Data Engineering · AWS · Airflow

Automated ML Workflows on AWS

Orchestrated automated ETL and ML workflows using Airflow DAGs on AWS at Syneos Health. Built reusable ML components consumed by multiple downstream analytics and AI workflows, reducing data processing latency by 35%.

Apache Airflow AWS S3 AWS EMR Redshift Python
📡
Data Engineering · Real-Time · Dashboards

Real-Time Sensor Analytics Platform

Engineered real-time data ingestion pipelines from multiple sensor and API sources at UAB, improving data accuracy by 25%. Built interactive dashboards backed by standardized datasets, with anomaly detection and unified analytics layer for downstream modeling.

Python SQL Power BI Anomaly Detection APIs

Let's Build Something Incredible

Whether you have a full-time AI/ML opportunity, a research collaboration, or want to discuss LLMs and RAG systems — my inbox is always open. Based in San Francisco, open to remote.