Hypertension Risk Prediction

Machine Learning applied to cardiovascular health

94.2%+/- 0.7% ROC-AUC Discriminative power
89.3%+/- 2.8% Sensitivity Detecting positive cases
91.3%+/- 3.1% Specificity Identifying healthy cases
87.0%+/- 3.1% F2-Score Primary clinical metric

The Problem

Systemic arterial hypertension is one of the greatest public health challenges worldwide. Known as the "silent killer," it often shows no symptoms until significant cardiovascular damage occurs. Early detection through predictive Machine Learning models can help health professionals identify patients at risk, enabling preventive interventions that save lives and reduce treatment costs.

1.28B

Adults Affected

Hypertension affects approximately 1.28 billion adults worldwide

Source: WHO Global Report 2023 [6]
46%

Undiagnosed

Nearly half of people with hypertension are unaware of their condition

Source: WHO/Lancet Study 2021 [7]
#1

Risk Factor

Leading preventable risk factor for cardiovascular disease, causing 10.8 million deaths per year

Source: WHO Fact Sheet 2024 [8]

Methodology

End-to-end Machine Learning pipeline following best practices

1

Exploratory Analysis

Descriptive statistics, correlations, outliers, multicollinearity (VIF)

2

Preprocessing

Imputation, SMOTE (train only), scaling, stratified split

3

Training

10 models, 5-fold cross-validation, ensemble methods

4

Optimization

Grid Search, Random Search, clinical threshold analysis

5

Interpretability

Feature importance, SHAP values, clinical validation

Methodological Highlights

  • No Data Leakage: SMOTE applied exclusively to the training set
  • Clinical Metrics: F2-Score prioritizing minimization of false negatives
  • Robust Validation: Stratified 5-fold CV with multiple seeds
  • Optimized Threshold: Analysis of different thresholds for clinical contexts

Dataset

Dataset used to train and validate the predictive model

What is a dataset?

A dataset is the collection of information used to "teach" the Machine Learning model. In this project, we use real data from 4,240 patients with clinical and demographic information. A 31% prevalence indicates that about 1 in 3 patients in the study has hypertension, which represents an imbalance handled with specific techniques (SMOTE) during training. Dataset: The dataset used in this project is publicly available on Kaggle.
Access the link: Kaggle - hypertension-risk-model-main

4.240 Patients
12 Features
31% Prevalence

Model Variables

Variable Type Description
pressao_sistolica Continuous Systolic blood pressure (mmHg)
pressao_diastolica Continuous Diastolic blood pressure (mmHg)
idade Continuous Age in years
imc Continuous Body Mass Index (kg/m^2)
colesterol_total Continuous Total cholesterol (mg/dL)
glicose Continuous Blood glucose (mg/dL)
frequencia_cardiaca Continuous Heart rate (bpm)
cigarros_por_dia Continuous Cigarettes per day
sexo Categorical 0 = Female, 1 = Male
fumante_atualmente Categorical 0 = No, 1 = Yes
medicamento_pressao Categorical 0 = No, 1 = Yes
diabetes Categorical 0 = No, 1 = Yes

Results

Comparison of 10 models with stratified cross-validation

How to interpret the results?

We tested 10 different Machine Learning algorithms to find the most suitable for predicting hypertension. The Random Forest was selected as the best model because it offers the best balance between correctly detecting hypertensive patients (high sensitivity) and not generating unnecessary false alarms (good specificity).
In a clinical context, the priority is not to miss positive cases (false negatives), so we use F2-Score as the primary decision metric.

Best Model

Random Forest

95.08% ROC-AUC
84.84% F2-Score
90.46% Recall
79.89% Precision
n_estimators=100, max_depth=10, min_samples_split=2

Gradient Boosting

AUC: 95.96%

F2: 84.80%

XGBoost

AUC: 95.34%

F2: 84.30%

Logistic Regression

AUC: 95.47%

F2: 82.81%

Metrics Comparison

What it shows: Compares the performance of all tested models using different metrics. Use the dropdown to switch metrics. The taller the bar, the better the model performs on that metric.

ROC Curves

What it shows: The ROC curve illustrates the model's ability to distinguish between patients with and without hypertension. The closer the curve is to the upper-left corner, the better the model. The area under the curve (AUC) summarizes this ability: values close to 100% indicate excellent discrimination.

Confusion Matrix

What it shows: Summarizes the model's correct and incorrect predictions. True Positives (TP): hypertensive patients correctly identified. True Negatives (TN): healthy patients correctly identified. False Negatives (FN): hypertensive patients not detected (most critical in healthcare). False Positives (FP): false alarms in healthy patients.

Multi-metric Radar

What it shows: Comparative visualization of multiple metrics at once for the main models. Each axis represents a different metric. An ideal model would have all metrics close to 100%, forming a large polygon. Useful to identify strengths and weaknesses of each model.

Metrics Glossary

ROC-AUC Measures the model's overall ability to distinguish classes. Values above 90% are considered excellent.
Sensitivity (Recall) Percentage of hypertensive patients correctly detected by the model. High sensitivity = few missed cases.
Specificity Percentage of healthy patients correctly identified. High specificity = few false alarms.
Precision Of patients the model classifies as hypertensive, how many truly are. Measures the reliability of positive alerts.
F2-Score Weighted mean that prioritizes sensitivity (2x weight). Ideal for clinical settings where missing a positive case is worse than a false alarm.
Accuracy Overall percentage of correct predictions. Simple metric, but can be misleading in imbalanced datasets.

Interpretability

Understanding model decisions for clinical validation

Why is interpretability important?

In medical applications, it is not enough for the model to make accurate predictions ? it is essential to understand why it reached that conclusion. Interpretability allows clinicians to validate whether the model is using clinically relevant criteria, increasing trust in the system and supporting shared decision-making with the patient.
"Black-box" models that do not explain their decisions are less accepted in clinical practice.

Feature Importance

What it shows: How much each variable contributes to the model's decision. Larger bars indicate greater influence on the prediction. The colors identify the clinical category of each variable. Hover over each bar to see additional details.

Blood Pressure Demographics Anthropometrics Biomarkers Medications Lifestyle
Systolic Pressure
45.8% Blood Pressure
Diastolic Pressure
26.3% Blood Pressure
Age
7.0% Demographics
BMI
5.8% Anthropometrics
Total Cholesterol
3.3% Biomarkers

Importance by Clinical Category

What it shows: Grouping of variables by clinical category. The size of each circle represents the total contribution of that category to the model's predictions. Note how Blood Pressure dominates, which is aligned with established clinical knowledge about hypertension.

36.1%
Blood Pressure
5.8%
Anthropometrics
3.8%
Demographics
3.1%
Biomarkers

Recommended Clinical Thresholds

What it shows: The threshold (decision cutoff) determines from which probability the model classifies a patient as "at risk." A lower threshold (e.g., 0.30) detects more cases but generates more false alarms. A higher threshold (e.g., 0.80) is more precise but may miss some cases. The choice depends on the clinical context: population screening vs. diagnostic confirmation.

Screening

0.30

Sensitivity: 95.5%

Specificity: 86.5%

Minimize false negatives

Balanced

0.50

Sensitivity: 91.5%

Specificity: 91.0%

General use

Confirmation

0.80

Sensitivity: 79.7%

Specificity: 95.7%

Minimize false positives

Feature Importance (Interactive Chart)

What it shows: Interactive version of the feature importance ranking. Hover over the bars to see details for each variable, including its clinical category and description. Blood pressure variables dominate, validating established clinical knowledge.

Threshold Analysis

What it shows: How sensitivity and specificity change across thresholds. The dashed vertical lines indicate the three recommended clinical scenarios. Note the trade-off: increasing sensitivity reduces specificity and vice versa. The crossing point represents the "optimal" threshold for balancing both.

Deployment Architecture

Production pipeline on AWS with high availability

Why Cloud Deployment?

A Machine Learning model only creates value when it is available for real use. Deploying on AWS allows the system to be accessed from anywhere, 24/7, with high availability and low latency.
The serverless architecture (no dedicated servers) reduces costs and scales automatically with demand, making the system viable even for institutions with limited resources.

User
CloudFront CDN + HTTPS
S3 Bucket Static UI
API Gateway HTTP API
Lambda Python 3.11
ML Layer Scikit-learn

Lambda Function

  • Python 3.11 Runtime
  • 1024 MB memory
  • Random Forest Model
  • Response in <100ms

ML Dependencies

  • NumPy 2.2.6
  • Pandas 2.3.3
  • Scikit-learn 1.7.2
  • Imbalanced-learn 0.14

API Gateway

  • HTTP API (REST)
  • CORS enabled
  • Endpoint: /predict
  • Health check: /health

CloudFront + S3

  • Global CDN
  • Automatic HTTPS
  • Optimized cache
  • Responsive UI

Team

Advisor

Prof. Dr. Anderson Henrique Rodrigues Ferreira

Advisor and Developer

CEUNSP - Centro Universitario Nossa Senhora do Patrocinio

Student Developers

  • Marcelo V Duarte Colpani
  • Nicolas Souza
  • Rubens Jose Collin
  • Tiago Dias Borges

Referências

Artigos científicos que fundamentam a metodologia utilizada

[1]

M. Kivrak, U. Avci, H. Uzun, and C. Ardic, "The impact of the SMOTE method on machine learning and ensemble learning performance results in addressing class imbalance in data used for predicting total testosterone deficiency in type 2 diabetes patients," Diagnostics, vol. 14, no. 23, Art. no. 2634, Nov. 2024.
Link de Acesso: https://doi.org/10.3390/diagnostics14232634

[2]

A. Fernández, S. García, F. Herrera, and N. V. Chawla, "SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary," J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018.
Access link: https://doi.org/10.1613/jair.1.11192

[3]

M. Talebi Moghaddam, Y. Jahani, Z. Arefzadeh, A. Dehghan, M. Khaleghi, M. Sharafi, and G. Nikfar, "Predicting diabetes in adults: Identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm," BMC Med. Res. Methodol., vol. 24, Art. no. 220, Sep. 2024.
Access link: https://doi.org/10.1186/s12874-024-02341-z

[4]

Y. Li, Y. Yang, P. Song, L. Duan, and R. Ren, "An improved SMOTE algorithm for enhanced imbalanced data classification by expanding sample generation space," Sci. Rep., vol. 15, Art. no. 23521, Jul. 2025.
Access link: https://doi.org/10.1038/s41598-025-09506-w

[5]

J. Zhu, S. Pu, J. He, D. Su, W. Cai, X. Xu, and H. Liu, "Processing imbalanced medical data at the data level with assisted-reproduction data as an example," BioData Mining, vol. 17, Art. no. 29, Sep. 2024.
Access link: https://doi.org/10.1186/s13040-024-00384-y

[6]

World Health Organization, "Global report on hypertension: The race against a silent killer," Geneva: WHO, Sep. 2023.
Access link: https://www.who.int/teams/noncommunicable-diseases/hypertension-report

[7]

NCD Risk Factor Collaboration (NCD-RisC), "Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants," The Lancet, vol. 398, no. 10304, pp. 957–980, Sep. 2021.
Access link: https://doi.org/10.1016/S0140-6736(21)01330-1

[8]

World Health Organization, "Hypertension: Key facts," WHO Fact Sheets, Mar. 2023. [Online].
Access link: https://www.who.int/news-room/fact-sheets/detail/hypertension