Skip to content
View Lua-Matlab-Python-R-J2EE's full-sized avatar
💭
Looking for DS/ML/AI roles
💭
Looking for DS/ML/AI roles

Block or report Lua-Matlab-Python-R-J2EE

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

About me:

  • Quantitative Scientist & ML Engineer with a foundation in enterprise software engineering (Siemens) and over a decade of high-impact research at Oxford, KCL, and Manchester.
  • Academic Authority: Author of 26 peer-reviewed publications (18 first-author) with an h-index of 13, specialising in medical imaging, pharmaceuticals, and clinical data analysis.
  • End-to-End Engineering: I build production-ready AI/ML pipelines bridging rigorous scientific methodology with modern MLOps (FastAPI, CI/CD, SQL, Streamlit) and cloud-native tools (Basic AWS: SageMaker).
  • Versatile Expertise: Project experience spanning clinical imaging, biometrics, banking, and finance. Currently focused on deep learning and applied computer vision.

Core Tech Stack:

  • Core Languages & Testing: Python (Pandas, NumPy, Seaborn, Matplotlib, Pytest), SQL (MySQL), R, MATLAB, Java (J2EE).
  • Machine Learning & Stats: Scikit-learn, XGBoost, PyTorch, Statsmodels, MLflow (Experiment Tracking), Optuna (Tuning).
  • MLOps, Cloud & Deployment: FastAPI, Streamlit, Docker, CI/CD (GitHub Actions), AWS (Foundational: Bedrock, SageMaker).
  • Scientific Data & Viz: Medical Imaging (DICOM, NIfTI), 4D/Longitudinal Pipelines, Seaborn, Matplotlib.

Featured Projects (Production-Ready)

Health Insurance Premium Prediction

Gait Analysis & Biometrics

  • Tech: Python, PyTorch, Scikit-learn.
  • Impact: Benchmarked 9 experimental pipelines for small-sample biometric data with rigorous validation.

Full-Stack Expense Tracker

  • Tech: FastAPI, MySQL, Streamlit.
  • Impact: Developed a complete data application for real-time financial tracking and CRUD operations.

How to read this profile:

  • The pinned projects demonstrate practical data science work, including real-world datasets, model development, and deployment. Additional repositories contain EDA and learning exercises.

Selected Technical Posts


Click to expand full project list

Technical Portfolio & Open-Source Projects

  • Health Insurance Premium Prediction Python 3.10 Streamlit

    • Deployed App: https://ml-based-premium-prediction-v1.streamlit.app/
    • App Code: https://github.com/Lua-Matlab-Python-R-J2EE/ml-based-premium-prediction
    • Tech: Python | Scikit-learn | XGBoost | Streamlit
    • 50K synthetic records using a dual-model strategy
    • Reduced prediction error by 90%+ through age-based segmentation
    • Production-ready Streamlit app with CI/CD deployment
    • End-to-end ML pipeline from EDA to model deployment
    • Feature engineering using Variance Inflation Factor (VIF) analysis to address multicollinearity
    • Version 2 in progress: improved evaluation rigor, stricter validation boundaries, expanded testing, and enhanced maintainability
  • Ombudsman Complaints Forecasting in Python
    Time-series forecasting pipeline using XGBoost and Meta Prophet to predict daily complaint volumes for operational capacity planning and resource triage. Showcases skills in rigorous temporal splitting, data cleaning, feature engineering, and evaluating model constraints in noisy, low-volume operational environments.

  • Gait Analysis in Python
    ML analysis of gait biometrics across nine experimental pipelines, showcasing skills in data preprocessing, modeling, cross-validation, oversampling, clustering, and rigorous evaluation. Designed to highlight practical ML abilities, strong methodology, and clear reasoning in small-data, high-dimensional settings.

  • Expense Tracking System in Python
    A comprehensive expense management full-stack data application built with API design with FastAPI backend and Streamlit frontend, featuring real-time analytics and MySQL database integration for efficient personal finance tracking.

  • EDA in Banking Domain in Python
    Data analysis for an imaginary bank (using 50,000 records) to design and launch a competitive credit card product that aligns with market demands and customer preferences while minimizing failure risk.

  • EDA in Hospitality Domain in Python
    Data analysis for an imaginary hotel chain to uncover insights and recommend strategies for growth.

  • Movies Project in SQL
    A comprehensive SQL reference guide with practical examples covering fundamental to advanced SQL queries. All examples use a movies database schema for real-world learning.

  • Lean Body Mass Estimation in R
    Statistical Analysis: Comparison of ten predictive statistical models for estimating lean body mass against dual-energy X-ray absorptiometry (DXA) in older patients using correlation, Bland-Altman plots, and hypothesis testing.

  • DCE-MRI Tool in MATLAB
    Scientific Computing: General utility functions written in MATLAB/Octave as part of a software toolkit for analyzing 4-dimensional (4D) dynamic contrast-enhanced magnetic resonance imaging (dce-mri) data.

Click to view 20+ completed GitHub Engineering Modules

Continuous Professional Development

  • Hello GitHub Actions
    Learned the basics of GitHub Actions, including how to automate workflows directly from your repository using YAML configuration files.

  • Test with Actions 2
    Practiced configuring and running advanced CI workflows using GitHub Actions, focusing on automated testing and continuous integration best practices.

  • Publish Packages
    Practiced GitHub Actions to publish my project to a Docker image.

  • Your First Extension for GitHub Copilot
    Built and published a custom extension for GitHub Copilot, extending its coding capabilities to fit specific development needs.

  • Getting Started with GitHub Copilot
    Explored GitHub Copilot’s AI-powered code completions, learning how to boost productivity and write code faster.

  • Introduction to GitHub
    Covered GitHub essentials: creating repositories, managing files, and collaborating with others on code projects.

  • Communicate Using Markdown
    Mastered Markdown syntax to create well-formatted README files, documentation, and collaborative notes.

  • GitHub Pages
    Learned to publish and customize personal or project websites directly from GitHub repositories using GitHub Pages.

  • Review Pull Requests
    Practiced code review workflows, including providing feedback on pull requests and collaborating with team members to improve code quality.

  • Resolve Merge Conflicts
    Learned how to identify, understand, and resolve merge conflicts when working in collaborative repositories.

  • Release Based Workflow
    Explored advanced branching and release management strategies to ship project updates in a controlled and organized manner.

  • Connect the Dots
    Developed skills in linking issues, pull requests, and commits to streamline project management and maintain clear development history.

  • Code with Codespaces
    Learned to set up and use GitHub Codespaces for cloud-based development, enabling instant coding environments in the browser.

  • Introduction to Repository Management
    Gained foundational knowledge in managing repository settings, access controls, and collaboration features for effective project organization.

Pinned Loading

  1. ml-based-premium-prediction ml-based-premium-prediction Public

    End-to-end ML regression pipeline (FastAPI/Streamlit) for insurance pricing. 90% error reduction via dual-model strategy & VIF analysis. Includes CI/CD.

    Jupyter Notebook

  2. Expense-Tracking-System Expense-Tracking-System Public

    Full-stack personal finance application. FastAPI backend, MySQL database integration, and Streamlit frontend for real-time data analytics.

    Python

  3. EDA-Banking-Domain EDA-Banking-Domain Public

    Data-driven product design for a 50k record banking dataset. Applied customer segmentation and risk-aware logic to launch competitive credit products.

    Python

  4. EDA-Hospitality-Domain EDA-Hospitality-Domain Public

    Revenue optimisation and growth strategy analysis for a 150k transaction dataset. Translated complex trends into actionable business insights.

    Python

  5. gait_analysis gait_analysis Public

    Supervised & unsupervised ML for biometric classification. Benchmarking 9 pipelines with SMOTE & cross-validation for clinical-grade rigour.

    Jupyter Notebook

  6. LBM-R LBM-R Public

    Statistical benchmarking of 10 predictive models against DXA gold-standards. Features Bland-Altman analysis and hypothesis testing in R.

    R