AI Price Estimate Methodology
Understanding how we generate AI-powered healthcare price estimates
Last updated January 11, 2025
Introduction
The AI Price Estimate generates upfront cost estimates for healthcare services before an appointment. Patients provide a description of the care they plan to receive and select a provider. The system predicts the CPT billing codes likely to be used for that visit and returns a price estimate based on those codes. When additional context is available—such as patient age, gender, or new versus existing patient status—the system incorporates this information to further refine its predictions.
Our Benchmark: The Pre-Visit Estimate
Administrative staff at provider offices routinely provide price estimates to patients who call ahead. These estimates draw on the information available at the time: the patient's description of their visit, their history with the practice, and the billing codes typically associated with that type of care.
The AI Price Estimate delivers this same capability at scale. Rather than requiring a phone call and manual lookup, patients receive an instant estimate based on the same inputs an experienced administrator would use.
Pre-visit estimates are inherently limited. Clinical details—unexpected findings, additional tests, or the true complexity of a patient's condition—can only be determined once a patient is seen. The AI Price Estimate targets the accuracy of a knowledgeable administrator providing a one-on-one estimate, while recognizing that the final bill may vary based on what occurs during the actual visit.
How It Works
Billing Code Prediction
The AI Price Estimate uses a large language model (LLM) to predict the billing codes associated with a healthcare service. The model takes as input the patient's description of their planned care, the provider's specialty, and the care setting. From this information, it predicts the CPT, HCPCS, or DRG codes that would typically be billed for that type of visit.
Pricing Data
Once billing codes are predicted, the system retrieves pricing information from our database. We access this data through both open data sources and third-party data vendors.
The underlying data originates from three primary sources. CMS open data provides pricing information published by the Centers for Medicare & Medicaid Services. Hospital price transparency files are machine-readable files that hospitals are required to publish under federal regulations, containing their standard charges and negotiated rates with insurers. Transparency in coverage files are similar disclosures that health insurers are required to publish, detailing the negotiated rates they have with providers. Together, these sources provide broad coverage of pricing across providers and payers nationwide.
Price Estimate and Benchmarking
The system returns both a price estimate for the selected provider and benchmarking information to provide context. The benchmark represents the median price for each billing code among nearby providers of the same specialty, typically within a 25-mile radius. This allows patients to see not only what they might pay at a specific provider, but also how that price compares to typical rates in their area.
Proprietary Evaluation Dataset
Lack of Open Benchmarks
Existing benchmarks for evaluating AI-based medical coding typically rely on the Medical Information Mart for Intensive Care (MIMIC) dataset, which contains intensive care unit and emergency department encounters. These clinical settings represent urgent, unplanned care rather than the shoppable services that patients are able to schedule in advance and seek price estimates for.
Existing benchmarks are also focused on post-encounter prediction of billing codes based on completed clinical documentation, rather than pre-encounter prediction based on a patient's description of their planned care. Currently, there are no standardized, open benchmarks for this use case of generating pre-encounter price estimates for shoppable healthcare services. In order to address this gap, we hired subject-matter experts in medical coding and billing to develop a proprietary evaluation dataset to measure the accuracy of our AI price estimate product.
Dataset Construction
The evaluation dataset is based on the 70 Shoppable Services list published by the Centers for Medicare & Medicaid Services (CMS). These are the healthcare services patients most commonly schedule in advance—office visits, laboratory tests, imaging studies, and common procedures—and seek price estimates for.
For each service, we generated multiple test scenarios varying patient demographics (age, gender), new versus existing patient status, provider specialty, care setting, and visit purpose. This produced 210 test cases spanning the range of realistic patient situations.
Example Test Case
Input:
- Treatment Description: Basic metabolic panel
- Patient: 50-year-old male, new patient
- Provider: Family practice, physician office setting
- Visit Purpose: Preventive
Output:
Expected Code(s): CPT 80048
Subject Matter Expert Labeling
Experienced medical coders created the ground truth labels for each test case, mapping the treatment description and patient context to the correct billing code(s). The labeling process followed the same logic an administrator would use when providing a pre-visit estimate, ensuring the evaluation reflects real-world coding decisions rather than theoretical mappings.
Current Performance
We evaluate the AI Price Estimate using standard classification metrics adapted for billing code prediction.
Billing code prediction is a multi-class, multi-label problem. The system must predict both the correct number of codes and the correct codes themselves from thousands of possible billing codes.
We use micro-averaged metrics to evaluate precision, recall, and F1 score. Micro-averaging calculates metrics globally by summing true positives, false positives, and false negatives across all test cases, then applying the standard formulas. This treats every prediction equally regardless of which test case it came from.
The core evaluation metrics we track are:
- • Exact Match Rate: The strictest measure. A prediction counts as an exact match only if the system predicts the correct number of codes AND all predicted codes are correct for that scenario. Predicting an extra code or missing one results in a failed match, even if some predictions were correct.
- • Micro-Precision: Measures accuracy across all predictions. Of all billing codes predicted by the system across all test cases, what percentage were correct?
- • Micro-Recall: Measures completeness across all test cases. Of all billing codes that should have been predicted, what percentage did the system correctly identify?
- • Micro-F1: The harmonic mean of micro-precision and micro-recall, providing a single metric that balances both accuracy and completeness.
Results
| Metric | Score |
|---|---|
| Exact Match Rate | 81.4% |
| Micro-Precision | 80.7% |
| Micro-Recall | 83.8% |
| Micro-F1 | 82.2% |
Evaluated on 210 test cases across 70 healthcare service categories.
Limitations
Pre-Encounter Uncertainty
Price estimates generated before a visit are inherently limited by the information available at that time. Clinical details discovered during the appointment—unexpected findings, additional tests, or greater complexity than anticipated—can change what is ultimately billed. This is a fundamental constraint of any pre-visit estimate, whether provided by an AI system or an experienced administrator. The AI Price Estimate targets the accuracy of a knowledgeable pre-visit estimate while recognizing that the final bill may differ based on what occurs during the actual visit.
Ambiguous Service Categories
Certain billing codes differ by technical details that cannot be determined from a treatment description alone. For example, laboratory tests may have separate codes for manual versus automated processing, or for tests with and without specific components. In these cases, the correct code depends on how the provider's lab performs the test—information not available when generating a pre-visit estimate. Our evaluation shows lower accuracy for these inherently ambiguous categories.
Pricing Data Coverage
Healthcare pricing data comes from multiple sources with varying levels of completeness. While federal transparency requirements have significantly expanded the availability of pricing information, not all provider and payer combinations have published rates. In cases where specific pricing data is unavailable, estimates may be based on regional benchmarks rather than provider-specific rates.
We recommend that patients use the AI Price Estimate as a starting point and verify pricing directly with their provider's office before scheduling care. This allows patients to confirm the estimate and discuss any additional costs that may apply to their specific situation.
References
- • American Medical Association - Necessity for and Limitations of Price Transparency in American Health Care
- • Arrow, Kenneth - Uncertainty and the Welfare Economics of Medical Care (1963)
- • Centers for Medicare & Medicaid Services - 70 Shoppable Services List
- • Centers for Medicare & Medicaid Services - Good Faith Estimate Fact Sheet
- • Centers for Medicare & Medicaid Services - Health Plan Price Transparency
- • Centers for Medicare & Medicaid Services - Hospital Price Transparency
- • Centers for Medicare & Medicaid Services - No Surprises Act
- • PhysioNet - MIMIC-IV Dataset
- • scikit-learn - F1 Score and Micro-Averaged Metrics
- • scikit-learn - Multiclass and Multioutput Algorithms
- • The Hill - Americans Overwhelmingly Favor Health Care Price Transparency