- Quantitative Scientist & ML Engineer with a foundation in enterprise software engineering (Siemens) and over a decade of high-impact research at Oxford, KCL, and Manchester.
- Academic Authority: Author of 26 peer-reviewed publications (18 first-author) with an h-index of 13, specialising in medical imaging, pharmaceuticals, and clinical data analysis.
- End-to-End Engineering: I build production-ready AI/ML pipelines bridging rigorous scientific methodology with modern MLOps (FastAPI, CI/CD, SQL, Streamlit) and cloud-native tools (Basic AWS: SageMaker).
- Versatile Expertise: Project experience spanning clinical imaging, biometrics, banking, and finance. Currently focused on deep learning and applied computer vision.
- Core Languages & Testing: Python (Pandas, NumPy, Seaborn, Matplotlib, Pytest), SQL (MySQL), R, MATLAB, Java (J2EE).
- Machine Learning & Stats: Scikit-learn, XGBoost, PyTorch, Statsmodels, MLflow (Experiment Tracking), Optuna (Tuning).
- MLOps, Cloud & Deployment: FastAPI, Streamlit, Docker, CI/CD (GitHub Actions), AWS (Foundational: Bedrock, SageMaker).
- Scientific Data & Viz: Medical Imaging (DICOM, NIfTI), 4D/Longitudinal Pipelines, Seaborn, Matplotlib.
- Deployed App: https://ml-based-premium-prediction-v1.streamlit.app/
- App Code: https://github.com/Lua-Matlab-Python-R-J2EE/ml-based-premium-prediction
- Tech: Python, Scikit-learn, XGBoost, FastAPI, Streamlit, GitHub Actions.
- Impact: Built a dual-model strategy for 50k records; reduced prediction error by >90%.
- Engineering: Full CI/CD pipeline with automated deployment.
- Tech: Python, PyTorch, Scikit-learn.
- Impact: Benchmarked 9 experimental pipelines for small-sample biometric data with rigorous validation.
- Tech: FastAPI, MySQL, Streamlit.
- Impact: Developed a complete data application for real-time financial tracking and CRUD operations.
- The pinned projects demonstrate practical data science work, including real-world datasets, model development, and deployment. Additional repositories contain EDA and learning exercises.
- https://www.linkedin.com/posts/activity-7402053339540193283-wgZp
- https://www.linkedin.com/posts/activity-7393415445820633088-fMu7
- https://www.linkedin.com/posts/activity-7392617954150080512-vqHj
- https://www.linkedin.com/posts/activity-7391090472037036032-66Iu
Click to expand full project list
-
Health Insurance Premium Prediction
- Deployed App: https://ml-based-premium-prediction-v1.streamlit.app/
- App Code: https://github.com/Lua-Matlab-Python-R-J2EE/ml-based-premium-prediction
- Tech: Python | Scikit-learn | XGBoost | Streamlit
- 50K synthetic records using a dual-model strategy
- Reduced prediction error by 90%+ through age-based segmentation
- Production-ready Streamlit app with CI/CD deployment
- End-to-end ML pipeline from EDA to model deployment
- Feature engineering using Variance Inflation Factor (VIF) analysis to address multicollinearity
- Version 2 in progress: improved evaluation rigor, stricter validation boundaries, expanded testing, and enhanced maintainability
-
Ombudsman Complaints Forecasting in Python
Time-series forecasting pipeline using XGBoost and Meta Prophet to predict daily complaint volumes for operational capacity planning and resource triage. Showcases skills in rigorous temporal splitting, data cleaning, feature engineering, and evaluating model constraints in noisy, low-volume operational environments. -
Gait Analysis in Python
ML analysis of gait biometrics across nine experimental pipelines, showcasing skills in data preprocessing, modeling, cross-validation, oversampling, clustering, and rigorous evaluation. Designed to highlight practical ML abilities, strong methodology, and clear reasoning in small-data, high-dimensional settings. -
Expense Tracking System in Python
A comprehensive expense management full-stack data application built with API design with FastAPI backend and Streamlit frontend, featuring real-time analytics and MySQL database integration for efficient personal finance tracking. -
EDA in Banking Domain in Python
Data analysis for an imaginary bank (using 50,000 records) to design and launch a competitive credit card product that aligns with market demands and customer preferences while minimizing failure risk. -
EDA in Hospitality Domain in Python
Data analysis for an imaginary hotel chain to uncover insights and recommend strategies for growth. -
Movies Project in SQL
A comprehensive SQL reference guide with practical examples covering fundamental to advanced SQL queries. All examples use a movies database schema for real-world learning. -
Lean Body Mass Estimation in R
Statistical Analysis: Comparison of ten predictive statistical models for estimating lean body mass against dual-energy X-ray absorptiometry (DXA) in older patients using correlation, Bland-Altman plots, and hypothesis testing. -
DCE-MRI Tool in MATLAB
Scientific Computing: General utility functions written in MATLAB/Octave as part of a software toolkit for analyzing 4-dimensional (4D) dynamic contrast-enhanced magnetic resonance imaging (dce-mri) data.
Click to view 20+ completed GitHub Engineering Modules
-
Hello GitHub Actions
Learned the basics of GitHub Actions, including how to automate workflows directly from your repository using YAML configuration files. -
Test with Actions 2
Practiced configuring and running advanced CI workflows using GitHub Actions, focusing on automated testing and continuous integration best practices. -
Publish Packages
Practiced GitHub Actions to publish my project to a Docker image. -
Your First Extension for GitHub Copilot
Built and published a custom extension for GitHub Copilot, extending its coding capabilities to fit specific development needs. -
Getting Started with GitHub Copilot
Explored GitHub Copilot’s AI-powered code completions, learning how to boost productivity and write code faster. -
Introduction to GitHub
Covered GitHub essentials: creating repositories, managing files, and collaborating with others on code projects. -
Communicate Using Markdown
Mastered Markdown syntax to create well-formatted README files, documentation, and collaborative notes. -
GitHub Pages
Learned to publish and customize personal or project websites directly from GitHub repositories using GitHub Pages. -
Review Pull Requests
Practiced code review workflows, including providing feedback on pull requests and collaborating with team members to improve code quality. -
Resolve Merge Conflicts
Learned how to identify, understand, and resolve merge conflicts when working in collaborative repositories. -
Release Based Workflow
Explored advanced branching and release management strategies to ship project updates in a controlled and organized manner. -
Connect the Dots
Developed skills in linking issues, pull requests, and commits to streamline project management and maintain clear development history. -
Code with Codespaces
Learned to set up and use GitHub Codespaces for cloud-based development, enabling instant coding environments in the browser. -
Introduction to Repository Management
Gained foundational knowledge in managing repository settings, access controls, and collaboration features for effective project organization.

