View of Assessing Data Independence in Sequential Ultrasound Imaging for Machine Learning-Based Liver Disease Diagnosis

Details

Vol. 2 No. 1 (2024)

ARTICLE

Assessing Data Independence in Sequential Ultrasound Imaging for Machine Learning-Based Liver Disease Diagnosis

Abstract

Machine learning applications in medical imaging demand sufficient and independent data for effective training and testing. However, sequential imaging data, such as time-series images, often exhibit inherent correlations, leading to data interdependence. This study investigates statistical methods to assess the independence of sequential B-mode liver ultrasound images. We analyzed 1,180 ultrasound images containing 5,903 regions of interest from patients with liver fibrosis, steatosis, or normal liver conditions. Texture features extracted from these images were used to train machine learning models for computer-aided diagnosis. The models achieved strong performance, with logistic regression yielding an AUC of 0.928 for binary classification and random forest achieving an AUC of 0.917 for multiclass classification. To evaluate data independence, we applied Jensen Shannon (JS) divergence, which revealed that images from normal livers were independent, while those from diseased livers showed interdependence. These findings highlight the importance of testing for data independence in sequential medical imaging to ensure model generalizability and prevent data leakage. Such statistical tests can guide the use of same-subject images in training machine learning models for real-world medical imaging applications.