What is Claims Data and Its Advantages and Disadvantages?

Publish Date
Read Time
August 10, 2023

Claims data is generated whenever healthcare providers submit a request for payment to a health plan, providing information about the interactions between patients, healthcare providers, and insurers. The data typically include:

  • Diagnosis Codes (“ICD-10”): These standardized codes represent patient conditions, aiding trend analysis and resource allocation.
  • Procedure Codes (“CPT”): These uniform codes for medical services facilitate effective billing, resource management, and treatment evaluation.
  • National Provider Identifier (NPI): These unique identifiers ensure accurate reimbursement and transparency for providers in healthcare transactions.
  • Service Dates: Recording when services occur aids in tracking patient care, optimizing workflows, and improving outcomes.

Different Types of Claims Data

There are two types of medical claims data and they originate from two main sources, each shedding light on different aspects of healthcare utilization.

Open Claims

These are healthcare transactions that are in progress or have not yet reached their final resolution. It encompasses services that have been provided to a patient but are still in the process of being reviewed, adjudicated, or reimbursed.

Open claims are suitable for scenarios where a high volume of patients need to be captured, and a certain degree of missing data is acceptable, such as marketing use cases. This data tends to be more real-time.

They are generated by healthcare facilities that incorporate clinical data, which includes procedures, diagnoses, and encounters.

Closed Claims

Conversely, closed claims signify healthcare transactions that have been finalized and resolved. The claims underwent review, processing, and reimbursement (if applicable), resulting in a determination of payment or denial. Closed claims are reflective of completed patient encounters, representing a comprehensive view of the care provided and associated costs.

Closed claims are suitable for scenarios where completeness of claims need to be captured, such as adverse events. This data tends to lag from weeks to months.

They originate from insurers, covering finances, coverage, and claims.

Advantages and Disadvantages of Using Claims Data for Analytics

While the data offers valuable insights, it also presents challenges that must be carefully considered. It’s crucial to weigh the advantages and disadvantages before accessing and using claims data.

Advantages of Claims Data

  • Standardization and Efficiency: Claims data adheres to standardized templates and care codes, facilitating streamlined data processing. This standardized format expedites analysis and ensures cleaner, more consistent results.
  • Large Sample Size: The data produced is naturally extensive. Even rare conditions can yield sizable sample sizes. This large volume enhances the statistical significance of findings and supports comprehensive research.
  • Patient Tracking Across Providers and Payers: Data can be sourced from payers or providers, allowing for tracking patient journeys across different healthcare settings and insurance coverage. This comprehensive view provides a holistic perspective on patient care.
  • Cost-Effectiveness: The data is readily available and doesn’t necessitate additional data collection efforts. This makes utilizing the data more cost-effective compared to other data collection methods.
  • Structured and De-identifiable: Data is fully structured, simplifying the process of de-identifying patient information for privacy and security purposes.

Disadvantages of Claims Data

  • Lack of Clinical Detail: The granularity of physician notes and clinical details are often lacking, making it difficult to correlate claims with specific symptoms or outcomes. This limits the depth of analysis.
  • Data Fragmentation: Integrating data from multiple sources, especially considering different providers and payers, can be challenging due to data fragmentation. This can hinder efforts to create a holistic patient profile.
  • Gaps in Healthcare Journey: Gaps in a patient’s comprehensive healthcare journey may exist, as claims data alone often doesn’t capture every interaction, such as non-billable services or informal care.
  • Limited Utilization Insight: While claims data reveals when care is provided, it doesn’t indicate how often care is utilized or whether it’s appropriate for the patient’s condition.
  • Exclusion of Uninsured and Cash Transactions: Uninsured patients and transactions involving cash payments are often excluded. This exclusion limits the dataset’s representation of the entire patient population.
  • Absence of Certain Healthcare Data: Details like lab results, imaging data, and genetic information are often uncaptured, limiting the ability to explore sub-cohorts of diseases or conditions.

These limitations can be overcome or alleviated by strategically linking to real world data through other sources, such as electronic medical records (EMR), registries, and health data ecosystems. This allows for a more holistic and nuanced understanding of patient care, outcomes, and healthcare utilization.

Using Claims Data

By combining claims data with other real world data, more healthcare research applications become possible. From assessing treatment effectiveness to monitoring disease outbreaks and enhancing drug safety, it enables insights into real-world patient experiences and healthcare practices. Key applications include:

  • Comparative effectiveness research
  • Health economics and outcomes research (HEOR)
  • Disease surveillance and epidemiology
  • Patient adherence and treatment compliance
  • Predictive analytics and risk stratification

Comparative effectiveness research

Comparative effectiveness research becomes possible. Organizations can evaluate the effectiveness of different healthcare interventions and treatments. By analyzing data from real-world patient populations, organizations can assess how various therapies perform in everyday clinical practice, providing valuable insights into treatment outcomes and patient responses.


Consider a study comparing the effectiveness of different antihypertensive medications in reducing cardiovascular events. Researchers can utilize claims data from a large and diverse patient cohort to determine which medications demonstrate superior outcomes in managing hypertension and preventing adverse cardiovascular events.

Case Study: Understanding clinical and economic outcomes in rheumatoid arthritis patients

Situation: A biopharma company wanted to evaluate clinical and economic outcomes associated with lower disease activity states for patients with rheumatoid arthritis.

Need: The company needed to understand whether disease activity in rheumatoid arthritis was associated with adverse events, specifically hospitalization, emergency department visits, mortality, and medical costs.

Solution: By combining registry and claims data, the biopharma company was able to conduct clinical and economic assessments. This resulted in:

  • Better understanding of how hospitalization rates correlate with disease activity
  • Improved value proposition for payers

Health economics and outcomes research (HEOR)

Policymakers, healthcare providers, and payers can focus on evaluating healthcare costs, resource utilization, and patient outcomes. This field of research informs decision making to improve the overall quality and efficiency of healthcare delivery.


A research study may examine the economic impact of implementing a new diabetes management program. By analyzing claims data, researchers can evaluate the program’s effect on hospital readmission rates, emergency room visits, and overall healthcare expenditures, helping organizations assess its cost-effectiveness and potential benefits.

Disease surveillance and epidemiology

Organizations can perform disease surveillance and epidemiological studies. By capturing a vast amount of patient information across different geographical regions, it allows organizations to monitor disease prevalence, incidence, and distribution in real-world populations. This helps identify potential health threats and guide public health interventions.


During a flu outbreak, researchers can use claims data to monitor trends in influenza diagnosis and treatment patterns. This information aids in predicting the spread of the disease, identifying high-risk populations, and informing targeted vaccination campaigns to mitigate the impact of the outbreak.

Case Study: Understanding the long term effectiveness of a new vaccine

Situation: A biopharma company was conducting a national, placebo-controlled Phase III study to assess the safety and effectiveness of a new vaccine in adults 18 years and older.

Need: The company wanted to enable observation of infection rates and adverse events beyond the period of the trial, thus demonstrating long-term effectiveness of their vaccine. In order to do so, the client needed to find and connect to real world data (RWD) sources.

Solution: Through Datavant, the company linked clinical trial data with claims, electronic health records (EHRs), labs, and other sources of health information. They obtained the data necessary to complete the long-term evaluation of the vaccine’s safety and effectiveness.

Patient adherence and treatment compliance

Healthcare organizations can assess patient adherence to prescribed treatments and medications. Understanding patient adherence is crucial for evaluating treatment effectiveness and identifying opportunities to improve healthcare outcomes through better patient engagement and support.


Researchers can use claims data to analyze medication refill patterns and gaps in treatment for chronic conditions like diabetes. By identifying patients with poor adherence, healthcare providers can implement interventions such as patient education, reminders, or personalized care plans to improve treatment compliance and health outcomes.

Case Study: Understanding patient adherence in infusion therapy

Situation: A pharma company had an infusion therapy in a competitive oncology indication. The company noticed a significant drop in patient adherence.

Need: The company wanted to link patient data with claims and social determinants of health (SDOH) data to understand factors leading to the drop in adherence.

Solution: By linking claims data with patient and SDOH data through Datavant, the company:

  • Identified that traveling to the infusion center every week was a challenge for patients, resulting in patients switching to a new oral therapy
  • Redirected resources to support transportation to the infusion center
  • Modified provider messaging to emphasize the longer-term benefits of their drug

Predictive analytics and risk stratification

Analysis structured around Current Procedural Terminology (CPT) codes is invaluable for understanding medical procedures and interventions.

However, claims data by itself offers limited understanding of patient conditions. To address this, integrating clinical data enriches analyses. Clinical data offers a broader context, detailing patient histories, diagnoses, and treatments, bridging the gap between procedures and individual health profiles.

Health data ecosystems, like Datavant, offer medical record retrieval services which provide a more comprehensive view of patients’ clinical state.


Researchers can use claims data to develop a predictive model for hospital readmission risk among heart failure patients. This model can help healthcare providers proactively identify patients with a higher likelihood of readmission, allowing them to provide targeted care and support to prevent unnecessary hospitalizations.

Case Study: Predicting referral and therapy initiation

Situation: A large biopharma company with marketed immunotherapy was seeking to better understand the detailed treatment history and referral journeys of patients treated with their product.

Need: Their internal data only provided a window into a narrow part of the patient journey. The company needed a longitudinal view to drive therapy participation.

Solution: By linking claims data with first-party data through Datavant, the company was able to:

  • Set predictive alerts for patients that would have a high likelihood of needing their therapy
  • Identify healthcare providers who were more likely to refer patients, increasing engagement with providers.

Protecting and Maintaining Privacy Around Claims Data

Privacy and data protection are paramount when working with claims data to ensure confidentiality and comply with ethical guidelines. Here are some measures that should be implemented:

  • De-identification and anonymization: To protect patient privacy, the data should undergo de-identification and anonymization processes, removing personally identifiable information while preserving the data’s integrity for research purposes.
  • Access controls and restricted data sharing: Access should be limited to authorized personnel and organizations. Implementing secure systems with access controls ensures that only approved individuals can access and analyze the data. Data sharing should be conducted through secure channels and agreements that prioritize privacy and compliance.

Accessing and Connecting Claims Data

Organizations can access a wealth of claims data through the Datavant ecosystem. Datavant connects disparate data sources, empowering organizations to gain a comprehensive view of patient populations, uncover new insights, and drive better outcomes.

Datavant ecosystem provides access and connectivity to various data sources, including:

Claims data has emerged as a powerful resource for real-world research, offering valuable insights into patient outcomes, healthcare utilization, and treatment effectiveness. Datavant empowers data-driven decisions by enabling organizations to securely access claims data.


Connect to the Nation's Largest Health Data Ecosystem

Request a demo

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us