Medical Clustering Utilizing Claims Data

Margret Bjarnadottir, Demitris Bertsimas

Operations Research Center,

Massachusetts Institute of Technology

Michael A. Kane

MIT Medical Department

J. Christian Kryder, Rudra Pandey

D2Hawkeye

Santosh Vempala

Georgia Institute of Technology

Grant Wang

Electrical Engineering & Computer Science, MIT

Large claims databases coupled with modern data mining methods have the potential to address important questions in healthcare. We explain how statistical clustering, in particular, can be used for health care cost predictions. We use claims data for close to 400,000 members over three years, to provide rigorously validated predictions of health care costs in the third year, based on medical and cost data from the first two years. Furthermore we illustrate through two examples, involving nonsteroidal anti-inflammatory agents on one hand and estrogen and antidepressants on the other, that our clustering algorithm can lead to discovery of medical knowledge.