The Science and Hard Work of Building and Maintaining a KOL Database
A lot has been written about KOLs and the value they can bring not only to drug development and commercialization in the pharmaceutical industry but also to companies developing medical devices and diagnostics.
Medical Science Liaisons (MSLs) are charged with identifying and engaging the most influential KOLs for their drug or disease area – a formidable challenge in a market as large and diverse as the US healthcare market. While experience and a network are helpful and important, nothing beats rigorous analysis based on a rich trove of data.
Find the Needle in the Haystack
The basis for finding the best KOLs for the job is data. With approximately 1 million active physicians in over 5,500 hospitals, plus physicians and researchers in universities and research centers in the US, identifying the one best suited for a particular situation is like finding the proverbial needle in the haystack.
The first challenge is collecting all the names and information of the relevant experts. What sounds like a time-consuming job, that is never done is exactly that: information needs to be pulled from university, hospital and other websites and while this process is able to be automated it is only a fraction of the real work. The data obtained from web crawlers is messy, entries contain typos, people are listed with two different institutions or departments, the labeling and formatting is inconsistent – to name but a few. Building and maintaining such a database requires resources that most companies don’t have.
In addition, knowing the names and affiliations of the physicians and other healthcare professionals does not provide any information about their reputation, their ability to educate their peers and their reach – in short, it says nothing about which physician would be the most qualified KOL. For that a whole host of other information is required: information about what and where these experts publish, which conferences they attend and speak at, which clinical trials they have been involved with, what grants they secured, what patents authored, what books published, and what payments they received from other companies.
This type of information is even harder to obtain and curate than the records of people because it comes from disparate sources, changes constantly and needs updating as close to real-time as possible.
The H1 Solution
H1 has built the databases so our customers don’t have to. Called “Ada” and “Curie”, our two databases contain comprehensive, up-to-date records of the people and organizations driving medicine (Ada) and information and analytics around thought leaders and their accomplishments (Curie).
Ada contains information on 1.3 million physicians and other experts in the top 5,000 hospitals plus the leading research universities and research centers in the US. The information is cleaned up, standardized and updated on a monthly basis.
Curie contains comprehensive information about the activities of these experts, a staggering 750 million activities adding up to about 1 billion data points which are updated almost in real time.
But the hardest task is yet to come: matching activities to the correct person. Here is an example: scientific journals often list names in a “J. Smith” type format. Among the approx. 80 - 100,000 people in the US with that name the probability that two are doctors in the same discipline is high. So, who published that article? How about people who are sometimes referred to as J. Smith, but also as Jen Smith and Jennifer Smith?
H1 uses machine learning algorithms to disambiguate the data, perform name frequency analysis, consider subject matter and co-author network plus other information to match the activity to the person.
The computing power needed to perform these tasks is staggering: at H1 100s of computers process and analyze data and a team of specialists hand-curates data that the algorithms can’t handle, like associating the names of drugs and their generic versions or linking biomarkers with the diseases they are associated with.
Identification of the most qualified KOL starts with a comprehensive, up-to-date database like Ada and Curie. Asking the KOL database the right questions is the next step in finding that elusive KOL in the haystack.