A Logistic Regression Implementation of the “Abcd” Method for Identifying Malignant Melanoma
Abstract
This project investigated a way to more accurately assess the probability that a skin mole is malignant or benign using the ABCD classification system used in healthcare. To determine the probability that a mole was malignant or benign, numerical values were calculated for each classification. The numerical value for A (asymmetry) was calculated by drawing an estimated half-way point through the image and then filling in each half with simple geometric shapes whose areas could be easily calculated in Microsoft Word. Once the area of each half was found, the absolute value of the difference between the area of the two halves was the numerical value assigned to A. For B (border irregularity), the perimeter and area of each mole was put into the formula B=(P)2/(4πT), where T stands for area. C (color) was estimated on a scale of 1-5, where 1 was light, uniform color distribution and 5 was uneven, splotchy, and dark color distribution. For D (diameter), the archive used (ISIC Archive) to obtain the images had the diameter included in the metadata of each image. In total, 45 training images were used, and 5 different test images were used to cross-validate the results obtained. Both a quadratic logistic regression model and linear logistic regression model were used to see how accurate both models were in predicting the probability that a mole was malignant or benign. The results produced showed that the quadratic model was more accurate than the linear; however, both models had a high rate of accurate predictions. The quadratic model accurately predicted 44 out of 45 of the training data sets and 5 out of 5 of the test data sets. The linear model accurately predicted 41 out of 45 of the training data sets and 4 out of 5 of the test data sets.