Random forest (RF) algorithm for heart disease prediction

Assegie et al. (2022) explored a dataset of heart disease images using machine learning models to predict heart disease. They used recursive feature elimination with cross-validation (RFECV) to analyze the significance of heart disease features on the output generated by the model. The dataset for this experiment was obtained from the University of California Irvine (UCI) machine learning dataset. Four machine learning algorithms were used for this experiment including: support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF). The results showed that feature quality significantly affected model performance, and the random forest (RF) algorithm outperformed the other algorithms, achieving 99.7% prediction accuracy. The advantages of using RF for heart disease prediction include:

  1. Ensemble learning: RF is an ensemble learning algorithm that combines multiple decision trees to improve model accuracy and robustness. This allows RF to handle complex non-linear relationships between features and target variables, making it suitable for predicting heart disease.
  2. Handles noise and outliers: RF is robust to noise and outliers in data, which can occur in real datasets. This makes RF a reliable choice for heart disease prediction, as it can handle variability and uncertainty in the data.
  3. Feature importance: RF provides a measure of feature importance, which can help identify the most relevant features for predicting heart disease. This can be useful for understanding the causes of heart disease and for guiding further research.
  4. Handling missing values: RF can handle missing values in data without the need for imputation. This can save computational time and resources, and reduce the risk of introducing bias into the model.
  5. High prediction accuracy: as mentioned earlier, RF can achieve a prediction accuracy of 99.7%. This is a high level of accuracy for heart disease prediction. This demonstrates the effectiveness of RF in this application.

An empirical study on machine learning algorithms for heart disease prediction

Tsehay Admassu Assegie, Prasanna Kumar Rangarajan, Napa Komal Kumar, Dhamodaran Vigneswari

In recent years, machine learning is attaining higher precision and accuracy in clinical heart disease dataset classification. However, literature shows that the quality of heart disease feature used for the training model has a significant impact on the outcome of the predictive model. Thus, this study focuses on exploring the impact of the quality of heart disease features on the performance of the machine learning model on heart disease prediction by employing recursive feature elimination with cross-validation (RFECV). Furthermore, the study explores heart disease features with a significant effect on model output. The dataset for experimentation is obtained from the University of California Irvine (UCI) machine learning dataset. The experiment is implemented using a support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF) are employed. The performance of the SVM, LR, DT, and RF models. The result appears to prove that the quality of the feature significantly affects the performance of the model. Overall, the experiment proves that RF outperforms as compared to other algorithms. In conclusion, the predictive accuracy of 99.7% is achieved with RF.

By: I. Busthomi