Projects — Rohan Barrowcliff | Data Science Portfolio

Surgery Duration Prediction with AutoML

Problem

Operating room inefficiencies cost the NHS over £400 million annually. Traditional surgery duration estimation methods (surgeon estimates: 59-70 min MAE, historical averages: 31-38 min MAE) are inaccurate, leading to late starts, cancellations, and wasted capacity. With 7.4 million patients on NHS waiting lists, accurate prediction is critical.

Approach / Tools

Evaluated AutoGluon, an automated machine learning (AutoML) framework, for surgery duration prediction using 94,502 elective orthopaedic procedures from East Kent Hospitals NHS Foundation Trust. Compared AutoGluon against linear regression, XGBoost, and neural networks under identical preprocessing. Implemented SHAP analysis to identify key drivers of duration and overrun risk.

Results / Insights

AutoGluon achieved 15.70 min MAE (26% improvement over XGBoost) and 11.84 min with extended training (46% improvement over surgeon estimates). SHAP analysis identified procedure type, inpatient status, and anaesthetic type as strongest predictors. Findings enable data-driven scheduling optimization without requiring extensive ML expertise.

Python AutoGluon XGBoost SHAP AutoML Healthcare NHS

Read Full Case Study View on GitHub

Patient QA System with Retrieval-Augmented Generation

Problem

Existing patient QA systems rely on rigid keyword matching and retrieval-only methods that sacrifice conversational fluency for safety. Academic prototypes using generative AI often hallucinate or lack evaluation metrics, making them unsuitable for deployment in healthcare settings where accuracy is paramount.

Approach / Tools

Developed a Retrieval-Augmented Generation (RAG) pipeline using BioBERT dense embeddings and FAISS vector search to retrieve relevant NHS document chunks. Implemented intelligent chunking strategies, optimized top-k retrieval (k=15), and engineered zero-shot prompts for the Phi-4-mini-instruct LLM. The system was evaluated on 211 expert-validated QA pairs using ROUGE-Lsum precision, BERTScore, and manual assessment of fluency, omissions, and hallucinations.

Results / Insights

Achieved a 40% improvement in ROUGE-Lsum precision (0.39 → 0.55) and 92.6% BERT F1 score while maintaining 100% fluency and zero hallucinations. Manual evaluation showed 68% of responses had no omissions. The project demonstrated that RAG can deliver both high factual accuracy and natural conversational quality, addressing a critical gap in patient-facing health information systems.

Python BioBERT FAISS RAG NLP LLMs Healthcare

Read Full Case Study View on GitHub

Multimodal CNN Architecture for Triage Acuity Prediction

Multimodal Deep Learning for Emergency Department Triage Acuity Prediction

Problem

Emergency department triage determines the urgency with which patients receive care using the Emergency Severity Index (ESI, Levels 1-5). Traditional triage prediction models rely solely on structured clinical data, missing the rich contextual information contained in patient chief complaints. Accurate triage prediction can improve patient flow, reduce wait times, and ensure critical patients receive timely care.

Approach / Tools

Developed a multimodal deep learning system combining structured clinical data (vital signs, demographics) with free-text chief complaints using the MIMIC-IV-ED dataset. Implemented MPNet sentence embeddings for text understanding and designed a dual-branch convolutional neural network: text features processed through Conv1D layers, structured features through fully connected layers, with late fusion for final classification. Deployed via Streamlit web app with Docker containerization for secure evaluation in Trusted Research Environments.

Results / Insights

Achieved 0.93 micro-average ROC-AUC across all acuity levels, with particularly strong performance on critical patients (ESI Level 1: 0.93 AUC) and less urgent cases (ESI Level 4: 0.91 AUC). The multimodal approach outperformed single-modality baselines, demonstrating the value of combining numerical and textual features. The model provides a foundation for future triage decision support systems while highlighting the importance of interpretability and external validation for clinical deployment.

Python PyTorch MPNet Deep Learning Multimodal AI MIMIC-ED Healthcare Streamlit

Read Full Case Study View on GitHub