The Evidence Base and PicnicHealth put Data Completeness In Focus

The Evidence Base and PicnicHealth partnered to put Data Completeness In Focus. Dan Drozd, MD, PicnicHealth’s Chief Medical Officer sat down with Darcy Hodge, Editor of the Evidence Base to talk about the importance of data completeness, unstructured data and additional factors that impact the generation of high-quality real-world evidence. Listen to the podcast, part of the In Focus feature by visiting The Evidence Base. A transcript of the interview appears below.


Darcy Hodge:

Hello and welcome to our latest Podcast episode brought to you by The Evidence Base, giving you the latest insights and opinions surrounding real world evidence, health economics and more. I am Darcy Hodge, editor of The Evidence Base and I will be your host for today. This Podcast will focus on data completeness, both how we can achieve this and how this benefits real world evidence. I am joined by Dan Drozd today who will share his expertise as Chief Medical Officer PicnicHealth on the issues surrounding data completeness and what could be done to resolve them, as well as artificial intelligence and machine learning techniques and even a peek into the future of RWE data generation. Dan, it’s great that we could have you on the podcast today.


Dan Drozd:

Thanks Darcy. I really appreciate the opportunity to speak with you and your listeners.


Darcy Hodge:

To begin, could you introduce yourself and PicnicHealth to our listeners?


Dan Drozd:

Absolutely, my name is Dan Drozd. I am an infectious disease physician, informaticist and epidemiologist by training and have spent really most of the last 15 years tackling this challenging problem of how we can integrate disparate data sources, primarily electronic health records, to create high quality fit-for-use-data that can be used to answer important clinical research questions.


I will tell you a little bit about PicnicHealth at the top. So we are a patient-centric real world data company. We believe fundamentally that patients ought to have the right to access and control their own healthcare data and through our platform we empower them to both gain access to this data and guide their own clinical care as well as share their de-identified data with our research partners including major academic institutions and bio-pharmaceutical companies.


Darcy Hodge:

Great, so can you define for our listeners what is data completeness and why is it so important to real world evidence?


Dan Drozd:

It’s a really important question and I when, I think about this, I really at a basic level break this down into two parts. The first is thinking about the breadth of data that we are able to capture, this is really over what period of time do we have data for a patient and from how many different care sites, physicians et cetera can we collect that data from. And then the second is really looking at the depth of that data. What kind of data are you able to structure and in the end how confident can a researcher be that a particular variable reflects whatever happened to that patient in the real world.


I think that a lot of real world data sources are doing a okay job and at least one of these data domains, claims data for example provides good information about billable information for patients interactions with the healthcare system in a confined window of time but you can’t tell for example the results of the patients lab values or a physician’s assessment of disease progression, treatment response et cetera. And other sources like registries might give you access to some important test results and assessments but miss key data that happened outside the range of where they are capturing data from. I think one of the unique things about PicnicHealth is that we aim to collect all the patient’s medical records from anywhere that the patient happens to be seen in the US and that we really take that data in any form that facilities are able to provide it to us and then that importantly includes data from both structured or codified versions of patients’ medical records as well as unstructured or largely narrative text sections of patients records as well.


Darcy Hodge:

So following on from that, what is unstructured data and why is it important compared to structured data?


Dan Drozd:

Yeah, so I will start with structured data cause I think that one is a little bit easier for people to wrap their heads around. So really this is data that is already codified. You can think of it as data that exists within some sort of table, within the electronic health record system. So an example might be a list of ICD-10 codes for problems that a patient has or a particular set of lab test results. In contrast unstructured data is really everything else in the patient’s records. So it tends to be data that comes from narrative text sections of patient’s notes and reports but also includes things like raw imaging files and DICOM images. From a practical perspective what does this mean? It really means for example if you think about the signs and symptoms that the patient might come see a provider for. Those are all things that are only going to be captured in narrative text or unstructured sections of patients records. If you think about results from an echo-cardiogram and ejection fraction for example again data that only comes from those unstructured sections of records or something like tumor response in a patient from a radiology report is the patient’s tumor getting larger or smaller on whatever therapy the patient happens to be on. I think really it’s our ability to sort of dive into this unstructured data that really is for us in many ways a differentiating factor and I think allows clinical researchers to really be able to start to answer, many of the sorts of questions, that have traditionally relied only on registries or randomized control trials.


Darcy Hodge:

Perfect, following on from that, how does PicnicHealth define quality and how is your viewpoint informed by regulatory bodies?


Dan Drozd:

Great question. So, when we start thinking about quality we really look to external benchmarks and frameworks that have been established and so there are a number of these that the one that we’ve built most heavily or lean most heavily off of, is one outlined by Duke-Margolis Center who’s worked hand-in-hand with the FDA, in understanding and defining data quality standards, for real world data.


The FDA often will use a term fit-for-use and I think it’s a really important term, because it acknowledges that one dataset may be appropriate for answering a particular research question, but may not be appropriate for answering some other research questions. And so it really is built off of this framework, working directly with our partners in understanding what their particular research questions of interest are that we determine whether our data ends up being fit for use for answering a particular question.


I think, as we take a step back, I think there are a couple of broad categories that we think of. One is data relevancy and the second is, as we mentioned, data quality. I think they go hand-in-hand in many cases, in defining this concept of fit-for-use. The former is much more about ensuring that the population of interest is representative, so that the people in our cohorts look like the people in the real world, that partners are interested in answering questions for. And the latter has to do with data accuracy, completeness overall and there’s a number of, sort of facets of completeness we alluded to before.


Data provenance, so how can I tell where a particular piece of data came from and then really the provision of clear documentation and processing rules, so anytime a piece of source data goes through some kind of transformation in our pipeline, the ability to document that. And then we generalize certain components of that framework and incorporate those more broadly, so even outside the context of a particular research question, our entire data processing pipeline is instrumented, and provides full provenance. So in each phase of abstraction, we have built-in data quality checks including things like intra and inter-rater agreements, outlier detection and then a series of higher level checks, particularly for derived variables that rely on going back to the actual records, and assessing that the variables that are derived out of our system reflect what was captured within the treating physician’s notes.


Darcy Hodge:

Makes sense. I suppose going a little bit wider than that, can you share some recent industry successes and challenges concerning artificial intelligence and machine learning techniques?


Dan Drozd:

Yeah, I think it’s a good question. I’ll answer this question a little bit personally. So I think, one of the things that have impressed me most about PicnicHealth, when I was thinking about joining, about a year and a half ago was that our approach overall to artificial intelligence and machine learning is both, technically sophisticated but also extremely realistic, but I think it’s fair to acknowledge that within the realm of clinical research, that the gains of machine learning and artificial intelligence have been more modest, than they have been in some other areas of the healthcare system, including things like clinical decision support and other back office operations.


I think one thing that we realize and acknowledge is that it’s really essential for real world data sources to provide full transparency into their processes and models, and that the idea of having a black box that some data gets fed into and then spits out a result, isn’t something that’s going to be satisfactory for regulators without very clear series of validation studies across multiple populations. This is the reason why the way that we leverage this machine learning, is to do it in the context of what we call— human-in-the-loop review.


So this basically means that we leveraged this technology to make predictions about important clinical concepts and then have those concepts that are predicted reviewed by trained chart abstractors. Ultimately by people, because that provides us with that additional transparency, as well as that additional safety check on the data, to ensure that the data coming out of our pipeline is as high quality as possible. So I think, overall, this is still a nascent area. One where the ground rules and standards haven’t clearly been fully elucidated and described. And where we’re really looking to both push the boundaries but also take a very pragmatic approach that acknowledges kind of the overall regulatory landscape in which we sit.


Darcy Hodge:

Interesting, and why are the changes within the healthcare industry necessary to improve data completeness strategies?


Dan Drozd:

I think they are and I think that we are very slowly seeing some of those changes take hold we are certainly big advocates for making patients data more accessible and available to them as I mentioned earlier, and really giving them a much easier path to being able to control and access their own data, I think that’s really the first step to improving completeness of patients data and honestly a big reason that I came to work at PicnicHealth. That said we also realize that healthcare providers and institutions have a important responsibilities for safeguarding patient data and privacy, and so this is a challenging area and one that I think we are continuing to move more and more in the direction where patients will serve as sort of a hub of being able to facilitate access to their data.


As a physician I know how frustrating it can be both to providers and to patients not to have access to records from outside institution, leads to a huge amount of waste in our system, leads to a times both too much care and poor care, health information exchanges I think are an exciting set of facilities and technologies that could really begin to advance data sharing within our ecosystem, but they’re not perfect. And I’ll share just a brief personal anecdote along these lines, so my step-dad is a liver transplant patient; he had a liver transplant about twelve years ago. During that period of time, he’s moved states, he’s been hospitalized a couple of times, and in many cases, I’ve had to serve as his health information exchange and that to me is simply uncomfortable. We really do need to be able to put data in the hands of patients. It’s one of the things that motivates me every day and certainly one of the things that I love about what I do, I know that we can do better on that front by empowering patients to control their own data. It’ll lead to better patient care; it’ll lead to better clinical research and it’s something that that motivates me as I get up and go to work every day.


Darcy Hodge:

Yeah, I mean your personal anecdote touched on it again a little bit. Can you explain some practical benefits for data completeness for patients and their outcomes?


Dan Drozd:

Yeah, I think put simply patients can’t receive the best possible care if their providers don’t have access to relevant pieces of their history. I’m an infectious disease physician by training as I mentioned. A big part clinically of what infectious disease physicians do, is understanding what antibiotics to give patients who are critically ill, for example in septic shock, the mortality rate for septic shock is about 40%, and usually providers who are seeing patients in septic shock provide what we call broad spectrum antibiotics, so these are antibiotics that tend to kill most bacteria, but the key here really is most. No antibiotic kills all bacteria, we in fact wouldn’t want an antibiotic to kill all the bacteria, and so if as a physician I had a patient who I knew had a history for example of having multi-drug resistant bacteria or prior infections, it would be essential for me to have access to the records in order to make the correct decision about what antibiotic to give them.


And very bluntly a patients chances of living are significantly higher if that correct choice is made, and side effects of giving incorrect antibiotics outside of direct patient outcomes can also be dramatically improved. I think from the patient perspective we hear a lot of frustration from patients understandably as providers about having to tell their stories over and over again, I can’t tell you how many times I’ve heard from patients that you know “Doc it’s all in the records” or I’ve heard “I just told this to the person who was in the room twenty minutes ago.” Many times, that is the sort of thing that we can help remove that burden from patients by simply having access to patients records as treating providers, so I think there are a number of ways in which that data completeness is super important not only to researchers in terms of understanding outcomes, but also patients both in terms of the burden that they carry as well as ensuring that their providers can provide the best possible care to them.


Darcy Hodge:

Coming off that, it really does sound like data completeness will help patients. So then to close, just as a general question. How do you see real world evidence generation developing over the next 5 –10 years? Is there anything hindering this?


Dan Drozd:

I think it’s a super dynamic field and I think there’s been a lot of buzz obviously particularly over the last several years about the potential for real world data and I think it’s very important if we separate some of that buzz from the reality. And the first thing that I always tell people is, it’s very clear to me that real world data is not a replacement or standardized, randomized controlled trials for example.


Really it's rarely a replacement for those. I think synthetic control arm trials are one possible exception to that. So, there have however been significant statistical advantages in terms of study design methods etcetera that can support the generation of causal inference, or the ability to differentiate or say with more confidence, that a particular treatment has led to a particular outcome, over the last number of years. And so I think there is a huge space that real world data has the potential to fill that answers questions that otherwise would not be answered. That are questions that no one is going to run a randomized control trial to answer for one reason or another.


From the industries side I expect to continue to see development and refinement of how to incorporate holistic real world evidence strategies into the entire product development life cycle. We’ve seen a lot of flux in shifting, in organizational structures over the past couple of years as company has worked on how to most effectively incorporate RWD into their development by life cycles. I think RWD provides an excellent opportunity to understand how treatments impact diverse subpopulation of patients. Often patients excluded from clinical trials for one reason or another and to help build value stories for payers and regulators as well. And I think that, we've seen a lot of interest in extending use of, linking traditional data sources, things like electronic health records, with more novel data sources. This is an area we're particularly active in, and in terms of including patients more directly through patient reported outcomes and involving patients throughout the entire life cycle of their research process.


Darcy Hodge:

Great. Thank you Dan for your insightful answers. It was a real pleasure to talk to you today.


Dan Drozd:

Thank you so much. I appreciate the opportunity to speak with you and your listeners as well Darcy.


Darcy Hodge:

Okay. So, with that, to our audience. Thank you for listening to this podcast, and special thanks to our guest Dan for his involvement today. If you're interested in finding out more about data completeness, I recommend our in focus on the topic, sponsored by PicnicHealth over www.evidencebaseonline.com. You can listen to more podcasts in our dedicated website section. Thank you for listening and goodbye.



Downloads

View Document

Downloads

Oops! Something went wrong while submitting the form.

Downloads

View Document