What is HCC?

In a nutshell, Hierarchical Condition Coding (HCC) is used to reimburse healthcare providers for managing the care of their medicare population through a risk adjustment model. HCC codes are used to calculate a Risk Adjustment Factor (RAF) score which determines the rate of reimbursement based on evidence of specific diagnosis conditions and associated treatment.

An HCC coder will typically audit clinical charts to identify this evidence and compile the supporting documentation to justify the scoring for eligible members. Since clinical charts can be hundreds of pages long, it is not difficult to imagine human error in searching for supporting documentation for HCC codes. But for large healthcare providers, the cumulative effect of human error can imply the loss of millions of dollars annually. Often providers turn to outside vendors to help manage the workload, perhaps in addition to employing a small team of HCC coders.

Use the Search

Since HCC coding is inherently an information retrieval task, we propose the use of search and NLP to augment the process with a humans-in-the-loop application designed to surface the most relevant parts of clinical charts for the auditor to simply verify the findings. Here, we share a high level view of our approach to applying AI in developing our HCC coder workflow support application.

Game Plan:

1. Get Text

We have built large text corpora of clinical charts and medical notes with data partners. Converting these images to text, we have a format useful for analysis. This includes faxed attachments which will trigger an appreciable OCR error rate. Consider using the open-sourced Tesseract-OCR.

2. Language Models

We expect to recognize the similarities between terms like 'inhalor' and the slang 'puffs' or perhaps some acronyms, and even common OCR mistakes like: 'fill' and 'flll'. To pull this off, we will train models like Word2Vec so that we can build an efficient numerical representation of our text which properly accounts for similarities in semantic meaning through strength of statistical association in the context of a terms appearance. Gensim offers a nice implementation.

3. Medical Jargon

To strengthen known associations of arcane acronyms and the like, we introduce mappings which include medical domain knowledge to our language model. The UMLS databases are valuable here.

4. Building a Training Corpus

Using historical HCC codes and clinical charts, we want to associate each document with all found HCC codes. Not every document will reference supporting evidence, consider this an example of weak labeling.

Text mining algorithms can have difficulty with large texts. We assume that there is some relevant text snippet that represents the documentation/evidence we seek to present to an HCC coder for approval or denial.

5. Breaking It Down

Through simple text mining ideas, we can find terms (including synoyms), diagnosis codes, prescription terms, and other language associated with each particular HCC code. However, putting negation, family history, or other context is put into focus, we cannot simply code depression because of the presence of terms like 'MDD'. We let search bring us high recall by expanding queries liberally to include related terms. Some analysis helps to decide which terms may be discarded. Elasticsearch's percolate API or tools like Spark help to speed this up.

6. Training a Classifier

To get better performance than keyword search, we take our labeled corpus and train a text classifier. Our word embeddings help here to regularize the data and reduce dimensionality. Simple RNNs and LSTMs prove powerful in differentiating positive references from those extraneous references like 'History of Depression: Check Y/N'. After high recall queries, we let the neural net filter irrelevant samples based on additional learned context to improve precision.

7. Ship it

After attaining reasonable text classification performance, we want to wrap the ranked suggestions in a simple web app. From a workflow standpoint, an HCC coder will review the elements of a clinical chart for which our text scraping and classifying models have high confidence, ultimately keeping a human in the drivers seat since compliance dictates low error. Think Tinder style swipe interface!

In the end, we have a fast application, based on the learned knowledge of a large clinical text corpus which we use to expedite the information retrieval process through a simple yes/no decision making interface for HCC coders. Full-text search with robust query expansion using statistical associations and domain knowledge for high recall. Deep learning based classifier trained on weakly labeled data for greater precision. Keep a lookout for Ongi.io!