Why do we need natural language processing in medicine: modern challenges and challenges

According to The Huffington Post, 80% of electronic medical records are stored in an unstructured form - the so-called “text bubble”. In text form, not only EMC data is stored, but also a large amount of other medical information - these are scientific articles, clinical recommendations, a description of diseases and complaints. And even if the data in them is partially structured, there are no generally accepted formats for their storage.

It is problematic to extract useful knowledge from the “text bubble” - the simplest algorithms are able to check a document for the occurrence of certain words or phrases, but this is not enough: the doctor always needs details. He needs not only to know that the patient has a temperature, but also to understand the dynamics: for example, “the temperature rises in the evenings to 39 and lasts for the fourth day”.

Natural Language Processing , NLP). Next, we will describe how NLP technologies simplify the work of doctors: we will talk about speech recognition and texts filled with medical terms, and help in making clinical decisions.


“DOC + doctors will politely deal with your sore”

What is NLP?


In fact, the history of NLP began in the early days of the modern science of artificial intelligence. Alan Turing in his work “ Computers and Mind ” as a criterion of “reasonableness” of a machine calls its ability to communicate with people - now this is an important, but not the only task that developers of NLP systems solve.

NLP combines a number of technologies (including those very distant from each other in terms of mathematics) that allow solving algorithmic problems associated with the processing of natural human language:

  • Extract facts from the text (from a simple search for stop words to a complete parsing of the literature);
  • Voice recognition and voice to text conversion;
  • Classification of texts;
  • Text or speech generation;
  • Machine translate;
  • Analysis of tonality of the text (including Opinion mining);
  • And etc.

In science fiction, a supercomputer is often able to do all of the above. In the cult film " Space Odyssey of 2001 ", HAL 9000 recognized human speech and visual images, communicated in a common language. In practice, all these tasks are highly specialized, and individual algorithms solve them.

And these algorithms (and the underlying technologies) are constantly progressing. For example, the most "closest" to ordinary users NLP direction - voice recognition - a few years ago was based on hidden Markov models . They broke what was said by a person into small components, singled out phonemes, carried out statistical analysis and produced the most likely result of what was said in text format. Now, developers are much more likely to use neural networks - in particular, recurrent neural networks and their variants, for example long short-term memory (LSTM).

Today, NLP systems are used more and more often - we talk with Siri, communicate with a Google assistant (in Android OS it’s just used LSTM with CTC) and car infotainment systems, smart algorithms protect our mail from spam, news aggregators select articles that will be of interest to us, and search engines allow us to find the information we need for any request.

What tasks does NLP solve in medicine?


However, NLP systems are useful not only in the operation of modern gadgets and online applications. They have been introduced in individual hospitals and medical universities since the beginning of the 90s.

The first NLP application developed at the University of Utah at that time was the Special Purpose Radiology Understanding System (SPRUS) for the Salt Lake City Clinic. This tool used information from an expert system that compares symptoms with appropriate diagnoses, and parsil textual radiological reports (medical protocols that interpret x-rays).

Program applied semantic parsing technique based on finding words in a thesaurus. The thesaurus was automatically replenished from the knowledge base for solving diagnostic problems using a specially developed compiler.

Since then, the possibilities of NLP and machine learning in medicine have stepped forward: today, technology simplifies the work with electronic medical records for doctors and reduces the frequency of clinical errors, “assisting” in making medical decisions.

Simplification of work with electronic cards (EMC)


Electronic medical records, or EMCs, are analogues of paper cards that are familiar to us. The task of the electronic card is to simplify the workflow and reduce the amount of paper work. We talked in more detail about what EMC is and how they help control the quality of medical care in one of our past materials .

Despite the fact that with the introduction of EMC, it has become easier for doctors to work with documents, it takes some time to fill out the cards. According to a study published in Computers Informatics Nursing in 2012, nurses in US hospitals still spend about 19% of their work time filling out electronic cards.

Yes, this is only a fifth of the working day, but even this figure can be reduced and the released resources can be used to care for the sick. According to Nuance Communications President Joe Petro, NLP technology will do this.

In 2009, Nuance learned the opinions of thousands of US therapists about natural language processing technologies. According to the results of the study, 94% of the doctors surveyed said the implementation of EMC with NLP was an important driver of the quality of medical care.

An example of this approach is a service that uses Hudson Valley Heart Center in Paukeepsey Using the solution from Nuance Communications, hospital nurses dictate extracts from the patient’s medical history, draw up the results of a physical examination and record data on the course of the disease. The application automatically updates entries in the EMC system implemented in the hospital.

Similar solutions are being implemented in Russia. For example, in 2016, the Center for Speech Technologies began developing the Voice2Med system for recognizing medical speech and reducing the time it takes to complete reports and medical records. As stated in the Ministry of Labor and Social Protection of the Russian Federation, now it takes half the doctor’s working time.

Our NLP Solutions


The key objective of NLP in medicine is to extract data from text. We at DOC + focus on it. Our machine learning algorithm team employs six people. Of these, two work exclusively on NLP technology. In DOC +, NLP technology is used to mark cards on which the EMC quality control system is trained (we wrote about it in the previous article ).

Based on the same system, our anamnesis bot also works, optimizing work with online consultations. The bot works online and asks the patient to describe complaints in a free form, then isolates the symptoms from the text and reports them to the doctor. Thanks to this, the specialist begins a telemedicine consultation with a patient already prepared (we will tell you more about the work of our anamnesis bot in the following posts).

Features of the development of NLP-systems

There are several difficulties in developing such systems. The first of them is that when working with texts it is not enough to use simple widespread algorithms and approaches. Services that scan the text for the presence of certain words and consider the frequency of their appearance to assess the “importance” in medicine give a very limited result.

When making a diagnosis, it is important for the doctor not only to know that the person had a particular symptom, but also to understand the dynamics and parameters of this symptom - localization, type of pain, exact values ​​of digitized indicators, etc. Therefore, more complex algorithms are needed to work with medical texts, highlighting not just words, but complex facts about various complaints and symptoms.
Из текста: «18 февраля у меня заболела голова с левой стороны, к вечеру поднялась температура до 39. На следующий день область головной боли увеличилась, головокружения не было» система должна выделить структурированную информацию о трех симптомах:

  1. Головная боль — появилась 18.02; локализация: слева; динамика: 19.02 — увеличение области.
  2. Температура — 18.02; значение: 39 градусов.
  3. Головокружение — симптом отсутствовал.
The second feature is that text processing tools need to be further customized to work with highly specialized materials. For example, we had to “tweak” the spelling checker additionally, since none of the solutions on the market met our requirements.
Спелл-чекеры исправляли слово «кашель» на «капель», поскольку обучались на текстах без медицинской терминологии. Поэтому мы переобучали систему на корпусе из медицинских статей. И такие небольшие доработки к классическим алгоритмам приходится делать постоянно.
What our NLP system can do

Now the solution we developed recognizes 400 terms - symptoms, diagnoses, names of drugs, etc. Moreover, for most symptoms, the system is able to isolate additional properties: localization (abdominal pain to the right of the navel ), type ( wet cough), color ( clear sputum ), the presence of complications and the values ​​of measurable parameters (temperature, pressure).

In addition, she knows how to isolate time parameters and compare them with symptoms, correct typos and work with different options for describing the same facts.

Clinical Decision Making Assistance (CDS)


Clinical decision support (CDS) support systems provide automated help to doctors when making a diagnosis, prescribing treatment, determining the dosage of drugs, and so on. The NLP systems allow to obtain the medical information necessary for this - they draw it from scientific papers, test results, medical guides, and even the patient’s words.

One of these solutions was developed at IBM. We are talking about the question-answer system DeepQA, with which it works supercomputer IBM Watson. Watson in this case acts as an “NLP search engine” for large databases: it processes the questions of doctors and gives them a specific answer, and not just displays search results on the Internet. The technology at Watson enabled him to win at Jeopardy! (American progenitor of "His game").

Another example of the application of such technologies is the NLP system created by a team of scientists led by Dr. Harvey J. Murff from the medical center of Vanderbilt University. The developers taught the algorithm to analyze electronic patient records and identify diseases that could cause complications after surgery.

The NLP processor indexed records in medical records using a scheme based on the SNOMED-CT systematized machine-processed medical nomenclature . At the output, the system generated an XML file with a “marked up” patient card. The experiments showed that the program correctly categorized most of the complications, for example, renal failure was correctly observed in 82% of cases, and postoperative myocardial infarction in 91% of cases.

DOC + also has its own analogue to CDS - any action of the doctor in the application is accompanied by hints, but so far they are formed by classical rules-based algorithms, without the use of machine learning and NLP. But we are working on creating a new generation of CDS that will read the patient’s entire case history in natural language and use it as a hint to the doctor.

Further development of NLP systems


NLP-systems will allow you to work not only with medical records, but also with scientific articles and medical standards. In the field of medicine, vast experience has been accumulated, which is summarized in clinical recommendations, scientific papers and other textual sources. It is logical to use this data for training artificial intelligence systems on a par with the maps of real patients, while creating a structured database of medicine that can be used not by people, but by algorithms.

The advantage of such NLP systems is that the results of their work are often easier to interpret, that is, link to specific sources. In general, the question of the interpretability of the results of machine learning algorithms is far from trivial, and is important both for the scientific community as a whole (at the leading international conference on machine learning ICML, a separate workshop is regularly devoted to it ) and for developers, especially when it comes to projects in the field of evidence-based medicine. For us, the requirement of interpretability makes the task of improving our NLP system even more difficult (and more interesting).

NLP is a promising area that will take the quality of medical care to a new level. We plan to actively develop these technologies further and continue to talk about our developments in our blog.



Additional reading: useful articles from our blog " Just ask ":