Study shows skewed dermatological datasets result in less accurate models

It’s no secret that darker skin has long been underrepresented in dermatology — most textbooks used to train future physicians feature predominately white patients, with few examples of what melanoma and other conditions look like in patients with Black and brown skin. As a growing number of companies build AI tools to assess common skin conditions, a recent study found that this skewed data makes them less accurate in darker skin.

Researchers at MIT Media Lab and Scale AI analyzed two widely used dermatology atlases: DermaAmin and Atlas Dermatologico. They labeled more than 16,500 images by Fitzpatrick skin type, a way of classifying skin pigmentation that, while imperfect, is still useful for evaluating algorithmic fairness.

Most publicly available dermatology datasets don’t include any information about skin type, race or ethnicity, making it difficult to quantify just how skewed the data is. According to the researchers’ findings, there were 3.6 times more images of the two lightest skin types than the two darkest skin types.

The disparities are even more stark when looking at specific skin conditions. While all 114 skin conditions were represented in the three lightest skin types, only 89 were represented in the darkest skin type.

Scale AI CEO Alexandr Wang said the results showed an underrepresentation of dark skin images in online dermatology atlases, which ultimately can lead to inaccurate results if a neural network is only trained on this data.

“Popular dermatology atlases and datasets have more images of people with a light rather than dark skin tone, therefore models trained using these atlases are most accurate on lighter skin types — with models’ accuracy decreasing the darker a skin type is as it moves further away from skin types present in its training data,” he wrote in an email.

The researchers demonstrated this by training a model to classify different skin conditions using the dataset. They found that this resulted in significant disparities in the model’s ability to correctly diagnose skin conditions involving darker skin tones. They shared their results through the Computer Vision Foundation.

The results could have significant implications for companies that are developing tools to help patients or primary care physicians identify different skin conditions. For example, Google recently announced plans for a consumer-facing symptom checker where people can upload an image of their skin and see the three most likely results. But in a published study of its AI model, the darkest skin type wasn’t represented at all in the dataset.

Photo credit: Andrii Shyp, Getty Images