4 Emerging Strategies to Advance Big Data Analytics in Healthcare

By Jessica Kent

October 02, 2020 - While the potential for big data analytics in healthcare has been well-documented in countless studies, the possible risks that could come from using these tools have received just as much attention.

Big data analytics technologies have demonstrated their promise in enhancing multiple areas of care, from medical imaging and chronic disease management to population health and precision medicine. These algorithms could increase the efficiency of care delivery, reduce administrative burdens, and accelerate disease diagnosis.

Despite all the good these tools could potentially achieve, the harm these algorithms could cause is nearly as great.

Concerns about data access and collection, implicit and explicit bias, and issues with patient and provider trust in analytics technologies have hindered the use of these tools in everyday healthcare delivery.

Healthcare researchers and provider organizations are working to find solutions to these issues, facilitating the use of big data analytics in clinical care for better quality and outcomes.

Providing comprehensive, quality training data

In healthcare, it’s widely understood that the success of big data analytics tools depends on the value of the information used to train them. Algorithms trained on inaccurate, poor quality data will yield erroneous results, leading to inadequate care delivery.

However, obtaining quality training data is a difficult, time-intensive effort, leaving many organizations without the resources to build effective models.

Researchers across the industry are working to overcome this challenge. In 2019, a team from MIT’s Computer Science and Artificial Intelligence Library (CSAIL) developed an automated system that can gather more data from images used to train machine learning models, synthesizing a massive dataset of distinct training examples.

The dataset can be used to improve the training of machine learning models, enabling them to detect anatomical structures in new scans.

“We’re hoping this will make image segmentation more accessible in realistic situations where you don’t have a lot of training data,” said Amy Zhao, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and CSAIL.

“In our approach, you can learn to mimic the variations in unlabeled scans to intelligently synthesize a large dataset to train your network.”

The current healthcare crisis has also prompted healthcare leaders to develop quality, clean datasets for algorithm development. In March, the White House Office of Science and Technology Policy issued a call to action for experts to build AI tools that can be applied to a new COVID-19 dataset.

The dataset is an extensive machine-readable coronavirus literature collection, including over 29,000 articles.

“It’s difficult for people to manually go through more than 20,000 articles and synthesize their findings. Recent advances in technology can be helpful here,” said Anthony Goldbloom, co-Founder and chief executive officer at Kaggle, a machine learning and data science community owned by Google Cloud.

“We’re putting machine readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19.”

Eliminating bias in data and algorithms

As healthcare organizations become increasingly reliant on analytics algorithms to help them make care decisions, it’s critical that these tools are free of implicit or explicit bias that could further drive health inequities.

With the existing disparities that pervade the healthcare industry, developing flawless, bias-free algorithms is often challenging. In a 2019 study, researchers from the University of California Berkeley discovered racial bias in a predictive analytics platform referring high-risk patients to care management programs.

“Algorithms can do terrible things, or algorithms can do wonderful things. Which one of those things they do is basically up to us,” said Ziad Obermeyer, acting associate professor of health policy and management at UC Berkeley and lead author of the study. “We make so many choices when we train an algorithm that feel technical and small. But these choices make the difference between an algorithm that’s good or bad, biased or unbiased.”

To remove bias from big data analytics tools, developers can work with experts and end-users to understand what clinical measures are important to providers, Philip Thomas, PhD, MS, assistant professor at the college of information and computer science at the University of Massachusetts Amherst, told HealthITAnalytics.

“We’re not promoting how to balance accuracy versus discrimination. We’re not saying what the right definitions of fair or safe are. Our goal is to let the person that’s an expert in that field decide,” he said.

While communicating with providers and end-users during algorithm development is extremely important, often this step is only half the battle. Collecting the high-quality data needed to develop unbiased analytics tools is a time-consuming, difficult task.

To accelerate this process, researchers at Columbia University have developed a machine learning algorithm that identifies and predicts differences in adverse drug effects between men and women by analyzing 50 years’ worth of reports in an FDA database.

“Essentially the idea is to correct for gender biases before you do any other statistical analysis by building a balanced subset of patients with equal parts men and women for each drug," said Payal Chandak, a senior biomedical informatics major at Columbia University and the other co-author on the paper.

Developing quality tools while preserving patient privacy

In algorithm development, the question of data privacy and security is high on the list of concerns. Legal, privacy, and cultural obstacles can keep researchers from accessing the large, diverse data sets needed to train analytics technologies.

Recently, a team from the University of Iowa (UI) set out to develop a solution to this problem. With a $1 million grant from the National Science Foundation (NSF), UI researchers will create a machine learning platform to train algorithms with data from around the world.

The group will develop a decentralized, asynchronous solution called ImagiQ, which relies on an ecosystem of machine learning models so that institutions can select models that work best for their populations. Organizations will be able to upload and share the models, not patient data, with each other.

“Traditional methods of machine learning require a centralized database where patient data can be directly accessed for training a machine learning model,” said Stephen Baek, assistant professor of industrial and systems engineering at UI.

“Such methods are impacted by practical issues such as patient privacy, information security, data ownership, and the burden on hospitals which must create and maintain these centralized databases.”

Researchers from the Perelman School of Medicine at the University of Pennsylvania also recently developed a solution to protect patient confidentiality. In a study published in Scientific Reports, the team described a new technique that enables clinicians to train machine learning models while preserving patient data privacy.

Using an emerging approach called federated learning, clinicians could train an algorithm across multiple decentralized devices or servers holding local data samples without exchanging them.

“The more data the computational model sees, the better it learns the problem, and the better it can address the question that it was designed to answer,” said senior author Spyridon Bakas, PhD, an instructor of Radiology and Pathology & Laboratory Medicine in the Perelman School of Medicine at the University of Pennsylvania.

“Traditionally, machine learning has used data from a single institution, and then it became apparent that those models do not perform or generalize well on data from other institutions.”

Ensuring providers trust and support analytics tools

Just as it’s essential for patients to trust that analytics algorithms can keep their data safe, it’s crucial for providers to trust that these tools can deliver information in a useful, trustworthy way.

In a recent report from the American Hospital Association (AHA), the organization noted that one way organizations could secure provider trust in these tools is to use AI to manage unsustainable workloads.

Additionally, leaders could leverage AI tools to augment clinical decision-making at the point of care, AHA stated. Allowing providers to review and refine AI tools could also help ensure clinicians are on-board with the technology.

Researchers from MIT’s CSAIL have also worked to increase providers’ trust of analytics tools. A team recently developed a machine learning tool that can adapt when and how often it defers to human experts based on factors such as the expert’s availability and level of experience.

“There are many obstacles that understandably prohibit full automation in clinical settings, including issues of trust and accountability,” said David Sontag, the Von Helmholtz Associate Professor of Medical Engineering in the Department of Electrical Engineering and Computer Science.

“We hope that our method will inspire machine learning practitioners to get more creative in integrating real-time human expertise into their algorithms.”

With healthcare organizations increasingly leveraging big data analytics tools for enhanced insights and streamlined care processes, overcoming issues of bias, privacy and security, and user trust will be critical for the successful use of these models in clinical care.

As research continues to evolve around AI, machine learning, and other analytics algorithms, the industry will keep refining these tools for improved patient care.

Analytics in Action News