Part 3 | Garbage In, Failure Out: The High Price of Low-Quality Data in Drug Development

In drug development, data equals currency. It drives decisions at every stage – from early research to clinical trials and regulatory filings. However, not all data is created equal, and the price of low-quality data —whether flawed, incomplete, or poorly curated —can be substantial.

In the first two blogs of this series, we defined what high-quality data looks like and why public biomedical datasets require expert curation to become fit for purpose. Now, we look at the flip side: what happens when data falls short of those standards?

Poor data quality slows down programs, obscures insights, leads to misguided decisions, and can even result in costly failures. Yet, the true costs are often hidden, buried in delays, rework, and missed opportunities. In this post, we examine the tangible and intangible costs of poor data in drug development and why investing in data quality upfront isn’t only good science but also good business.

Subjective Risk Assessments: A Hidden Driver of Costly Decisions in Drug Development

One of the most pervasive and underestimated consequences of poor data is its impact on risk assessment. When risk assessments are based on incomplete, inconsistent, or subjective data, the results can lead to very real and very expensive consequences.

Consequence #1: Misallocation of R&D Resources

When internal teams rely on flawed assumptions or historical biases rather than objective, data-driven insights, resources can be steered toward the wrong projects. Promising early-stage assets may be deprioritized, while high-risk candidates continue to absorb budgets, time and talent.

A 2024 review by the American Society for Clinical Pharmacology highlights how cognitive biases such as overconfidence, anchoring, and availability bias can systematically distort R&D prioritization, leading to inefficient budget allocation and unrealized therapeutic potential.

Consequence #2: Failure to Prioritize Promising Assets and Overconfidence in Others

In a crowded pipeline, accurate probability of technical and regulatory success (PTRS) assessments are essential for rational decision-making. If these scores are generated without standardized methodologies or comparative benchmarks, they can obscure the true potential of under-the-radar candidates.

In one internal review, our team at Intelligencia AI compared subjective PTRS estimates against externally validated assessments using our Portfolio Optimizer. The discrepancy exceeded 35% in multiple programs, highlighting how overconfidence in unvalidated internal models can lead to systemic misjudgments with significant financial implications.

Consequence #3: Costly Phase III Drug Failures

Phase III trials are resource-intensive and high-stakes. Advancing assets into late-stage development based on overly optimistic or biased risk assessments significantly increases the likelihood of failure, incurring costs of hundreds of millions per program. These late-stage failures are not just scientific setbacks; they’re financial and strategic disasters that ripple through entire portfolios.

A 2020 study of late-stage monoclonal antibody drugs showed that failures happen for a variety of reasons. Some failures were “unavoidable due to the lack of adequate science,” while others failed because they were advanced to Phase III despite weak Phase II data. Lampalizumab, a monoclonal antibody in the ophthalmology space, is a notable example.

The drug failed in Phase III due to “insufficiently stringent phase II trial design, incomplete understanding of disease pathway, and retrospective subgroup analysis from phase II results that led to false biomarker identification,” highlighting that advancing based on incomplete or biased risk assessments can result in very costly late-stage failures.

Consequence #4: Inaccurate NPV Models and Flawed BD&L Decisions

Poor-quality data doesn’t stop at internal pipeline decisions. It skews Net Present Value (NPV) calculations, affects partnering strategies, and distorts business development and licensing (BD&L) choices. Based on flawed data, deals are either made or passed on for all the wrong reasons.

Each of the four issues outlined points to a shared root cause: decision-making based on data that lacks objectivity, consistency, completeness, granularity, or contextual rigor. They reveal a pattern of systemic risk introduced when organizations rely on unstructured, biased, or incomplete data to drive multi-million (or billion) dollar decisions.

The price of low-quality data doesn’t just slow progress; it distorts judgment at every level:

  • R&D teams may chase the wrong targets or prematurely shelve high-potential assets.
  • Portfolio strategists may greenlight candidates that look promising on paper but are doomed by hidden risks.
  • Clinical teams may be pushed into costly Phase III trials with false confidence and no fallback in case of failure.
  • BD&L and finance leads may overvalue assets, misprice partnerships, or walk away from opportunities not because the science is flawed, but because the model was based on low-quality data.

The damage caused by these poor decisions, driven by bad data, is rarely isolated. One poor call can reverberate across an entire pipeline, resulting in wasted money, lost time, skepticism among healthcare providers, patients, and investors, and, even worse, delayed or foregone therapies for patients in need.

The Surprisingly Pervasive Use of Low-Quality Data in Drug Development

Poor data management in the pharmaceutical industry isn’t an isolated issue. Across the industry, companies continue to rely on ad hoc systems and siloed processes, resulting in inefficiencies, increased risk, and strategic blind spots.

survey we conducted last year at several industry events revealed that more than three-quarters of professionals identified either the lack of access to in-depth, timely data or the absence of objective, data-driven processes as the primary challenges they face when performing portfolio risk assessments. 

A survey of 400 industry professionals by Aspen Technology revealed that overall, 48% of pharma companies and 53% of larger firms (≥ $1 billion revenue) reported that data silos severely hindered cross-functional collaboration.

Further evidence highlights the price of low-quality data. An Oracle/Pharma Intelligence survey of clinical researchers worldwide found that 57% believe clinical data issues delayed trial completion.

When Poor Curation Undermines Good Data

Even when data is sourced from high-quality studies or authoritative databases, inadequate curation can make it unfit for purpose. Inconsistent labels, missing metadata and incompatible formats can all render otherwise valuable data misleading or unusable for drug development.

A couple of examples highlight the importance of curation:

  • A study on antiviral compound data revealed that inconsistent ontology and poor annotation in public databases like ChEMBL significantly limited their usability. After applying systematic curation, researchers increased assay coverage by 2.5x, highlighting how much signal can be lost without proper curation.
  • An evaluation of data entry in clinical studies found error rates as high as 27.8%. These issues led to downstream delays in analysis and interpretation, requiring time-consuming re-validation before results could be trusted.

These findings highlight the risks that poorly curated data pose for decision-making in drug development.

What Should We Takeaway From All of This?

Drug development is already a high-stakes endeavor. Developers cannot afford to compound that risk with poor-quality data. Whether it’s a flawed asset assessment, a missed partnership, or a late-stage failure, the cost of poor data can cascade across portfolios, delay much-needed therapies, and have a major financial and strategic impact on the asset developer. As the volume and complexity of biomedical data continue to grow, so does the need for consistent, curated, and contextualized inputs. Fixing broken data pipelines, eliminating silos, and investing in expert curation are not optional upgrades but necessities.

AI is poised to transform drug development, but its effectiveness is only as good as the data it utilizes. When it comes to using bad data, it’s a clear case of “garbage in, failure out.”

Ensuring data is clean, complete, and fit for decision-making takes time and expertise, but the cost of skipping that work is far higher. Fortunately, drug developers don’t have to tackle this challenge alone. Companies like Intelligencia AI specialize in collecting, cleaning, and curating biomedical data to support confident, data-driven decisions.

Read parts 1 and part 2 of our blog series and get in touch to learn how we support your peers in their decision-making.

De-risk clinical development and enhance decision-making with accurate, AI-driven probability of success.

yes