Development and Validation of a Predictive Model to Identify Individuals Likely to Have Undiagnosed Chronic Obstructive Pulmonary Disease Using an Administrative Claims Database

AUTHORS: Chad Moretz, Yunping Zhou, Amol D. Dhamane, Kate Burslem, Kim Saverno, Gagan Jain, Giovanna Devercelli, Shuchita Kaila, Jeffrey J. Ellis, Gemzel Hernandez, Andrew Renda



BACKGROUND: Despite the importance of early detection, delayed diagnosis of chronic obstructive pulmonary disease (COPD) is relatively common. Approximately 12 million people in the United States have undiagnosed COPD. Diagnosis of COPD is essential for the timely implementation of interventions, such as smoking cessation programs, drug therapies, and pulmonary rehabilitation, which are aimed at improving outcomes and slowing disease progression. 

OBJECTIVE: To develop and validate a predictive model to identify patients likely to have undiagnosed COPD using administrative claims data.

METHODS: A predictive model was developed and validated utilizing a retro-spective cohort of patients with and without a COPD diagnosis (cases and controls), aged 40-89, with a minimum of 24 months of continuous health plan enrollment (Medicare Advantage Prescription Drug [MAPD] and commercial plans), and identified between January 1, 2009, and December 31, 2012, using Humana’s claims database. Stratified random sampling based on plan type (commercial or MAPD) and index year was performed to ensure that cases and controls had a similar distribution of these variables. Cases and controls were compared to identify demographic, clinical, and health care resource utilization (HCRU) characteristics associated with a COPD
diagnosis. Stepwise logistic regression (SLR), neural networking, and decision trees were used to develop a series of models. The models were trained, validated, and tested on randomly partitioned subsets of the sample (Training, Validation, and Test data subsets). Measures used to evaluate and compare the models included area under the curve (AUC); index of the receiver operating characteristics (ROC) curve; sensitivity, specificity, positive predictive value (PPV); and negative predictive value (NPV). The optimal model was selected based on AUC index on the Test data subset. 

RESULTS:A total of 50,880 cases and 50,880 controls were included, with MAPD patients comprising 92% of the study population. Compared with controls, cases had a statistically significantly higher comorbidity burden and HCRU (including hospitalizations, emergency room visits, and medical procedures). The optimal predictive model was generated using SLR, which included 34 variables that were statistically significantly associated with a COPD diagnosis. After adjusting for covariates, anticholinergic bronchodilators (OR = 3.336) and tobacco cessation counseling (OR = 2.871) were found to have a large influence on the model. The final predictive model had an AUC of 0.754, sensitivity of 60%, specificity of 78%, PPV of 73%, and an NPV of 66%.  

CONCLUSIONS: This claims-based predictive model provides an acceptable level of accuracy in identifying patients likely to have undiagnosed COPD in a large national health plan. Identification of patients with undiagnosed COPD may enable timely management and lead to improved health outcomes and reduced COPD-related health care expenditures.

Content for class "break" Goes Here