The Good, The Bad and The Ugly UX Research

Why some research reduces uncertainty, some produces weak learning, and some creates dangerous certainty.

The Problem Isn’t the Research. It’s the Reasoning.

The deck looked exactly like research is supposed to look.

Ten interviews. A color-coded affinity map. A journey map with friction points highlighted in orange. Quotes from real participants, formatted in italics across several slides. And at the end, a clear recommendation: simplify the onboarding experience.

The work had been done. The presentation was professional. The room was nodding.

Then someone asked a question.

Not a hostile question — just a genuine one: Which specific finding led to this recommendation? What pattern in the evidence connects these quotes to this conclusion?

The room paused.

The researcher pointed to a quote on slide seven. The PM mentioned something a user had said in one of the interviews. The designer referenced a friction point on the journey map. But none of these answers quite assembled into a path. They were observations, fragments of conversations, moments that had felt meaningful during the research but became harder to trace when someone asked them to account for themselves.

The interviews had happened. The report existed. The recommendation was real. What was less visible was the reasoning connecting all three.

This is not an unusual situation. Most teams working with research have been in that room — either asking the question or struggling to answer it. The artifacts — affinity maps, quote slides, journey maps, structured reports — create a convincing impression of rigor. What they cannot guarantee is the actual analytical work underneath: moving from what participants said, through what the evidence suggests, toward something the team now understands that it did not understand before.

Research artifacts are evidence of activity. They are not evidence of understanding. That distinction matters more than it might appear. The confidence a team places in a research-backed recommendation is only as reliable as the reasoning behind it. When that reasoning cannot be traced, the confidence is not grounded in what the research demonstrated. It is grounded in the impression that research was done.

In most product organizations, that impression is enough to move forward.

It probably should not be.

The Ugly UX Research

When investigation becomes subordinate to validation

There is a version of this problem that is easy to describe: research conducted too quickly, with insufficient rigor, producing findings too generic to inform a real decision. It generates weak recommendations and conclusions that could have been reached without the research at all.

But there is another version that is harder to see — and considerably more consequential.

Not because it is more dramatic. Because it looks less like failure.

The most dangerous research failures are not cases where research was missing. They are cases where research was present, completed, and presented with conviction — and where the conclusions the team drew from it could not honestly be supported by the evidence that supposedly produced them. The team did not lack research. What they lacked was something more specific: an investigation genuinely open to finding something unexpected.

Ugly UX Research is what happens when the investigative process becomes subordinate to validation.

This is not a story about dishonesty. The researchers are often skilled. The participants are real. The interviews happened. The quotes are accurate. The report reflects what occurred.

The problem is structural: the investigation was not equally capable of producing all possible conclusions. And that changes everything the research can honestly claim.

Most product organizations do not enter research with a blank hypothesis. They enter with a direction. A redesign ready for testing. A feature that leadership has already committed to building. A strategic bet the team has been working toward for months. These are not unusual conditions — they are the normal operating context of almost every product team that commissions research.

The question is what happens to the investigation when those conditions are present.

Consider a team preparing to launch a redesigned onboarding flow. The brief they write begins: We want to validate the redesign before launch. The word validate is doing considerable work in that sentence. Validation implies that the conclusion has already been reached and that what remains is confirmation. The research is not being asked to investigate whether the redesign solves the right problem, or what users are actually experiencing when they encounter the product for the first time. It is being asked to confirm that the solution is ready.

That framing shapes the entire study. It influences which questions the researcher asks, how participants are introduced to the stimulus, and the interpretive lens applied when reviewing responses. Sometimes it operates through the specific wording of individual questions. A participant asked “how much easier did you find this version?” is being guided toward evaluating a predetermined variable. Their answer is genuine — they are not performing for the researcher. But the investigative frame has already narrowed the range of findings the research is capable of producing. The participant can describe how they experienced the redesign. They have no way of indicating that the redesign addresses the wrong problem, because that question was never put within reach.

The answer was real. The frame was directional.

Other times, the constraint operates not through framing but through scope.

A study evaluates a specific feature: how clearly it communicates value, how easily users navigate it, how it performs against an alternative. The research is well-designed within those boundaries. But there is a question it cannot answer — one its scope excludes by definition: whether the feature addresses the right problem at all. The investigation is capable of producing findings about the feature. It is structurally incapable of producing a finding that questions whether the feature, in its current form, should exist.

This rarely feels like a constraint. It feels like focus. Practical scoping. And often it is both simultaneously. The difficulty is when scoping functions as a perimeter around the organization’s preferred conclusion, systematically excluding the kinds of findings that could disrupt it.

Some teams introduce a different kind of structural limit before the investigation even begins: a success threshold.

If six of eight participants respond positively, we proceed. The threshold is established before a single interview is conducted. The research is no longer an inquiry into whether the decision makes sense — it is a gate to be crossed. When the count reaches the threshold, the confidence that follows feels earned. The investigation that produced it was not designed to produce anything else.

Contradictory evidence introduces another dynamic worth examining carefully.

In a ten-person study, three participants express confusion with a flow that the remaining seven navigate without difficulty. During the debrief, someone observes: those three might not be representative of our core user — probably less technically experienced. The suggestion is accepted. The evidence is reclassified as an outlier.

The dismissal may be entirely correct. Not all contradictory signals deserve equal weight. But the process by which the evidence became an outlier — the speed, the relief that seven participants had confirmed expectations — is worth examining. Findings that align with an existing direction tend to be accepted without much scrutiny; findings that create friction face a higher standard. That asymmetry, accumulated over a project, shapes what the final report is capable of saying.

The subtlest form of this dynamic requires no bad actors and no explicit dismissal.

When a stakeholder leans forward at a finding — when they say “that’s exactly what we expected” and ask follow-up questions — researchers and analysts notice. When a finding produces the opposite reaction, that too is registered. Over time, in organizations where research is expected to support rather than challenge, people learn — usually without realizing it — to produce findings that are received well. Not by fabricating data, but by emphasizing certain aspects of it. By framing interpretations in language that fits the preferred direction. By not pressing on evidence that would create friction with an existing decision.

The research remains real throughout. The selection happening inside it is invisible to nearly everyone — including, sometimes, the people doing the selecting.

What all of these dynamics share is the same underlying failure: the investigation loses its capacity to find something the organization did not already expect.

This is what separates Ugly research from research that is merely weak.

Weak research leaves a team with uncertain information. The findings feel soft. The team may proceed anyway — but the uncertainty remains present somewhere in the decision. They know, at some level, what they are acting on is incomplete.

Ugly research does not leave the team uncertain. It leaves them confident.

The confidence is not manufactured. It is derived — derived from the genuine process of interviewing real users, collecting real responses, and producing a real synthesis. The problem is that the investigative process was structured to confirm, not to investigate. The conclusion the team reached was one the research could only have supported, never challenged.

Bad research may leave a team uncertain about what users actually think.

Ugly research gives a team the certainty of people who have done their homework, built on an investigation that was never open to a different answer.

Ugly research is not the failure most product teams encounter most often. It is the most consequential — but not the most common. The more frequent problem is quieter, and in many ways more forgivable: considerable activity producing very little understanding.

The Bad UX Research

When observation never becomes understanding

The team that produces Bad UX Research is not operating under a predetermined conclusion. It is not constrained by what the organization has already decided, or organized around an answer that preceded the investigation. In most cases, it is genuinely curious. The brief was written in good faith. The researcher sat through hours of conversations, took careful notes, identified recurring themes, and produced a structured report.

The problem is not motivation. It is not carelessness or negligence. It is something more specific — and more difficult to address precisely because it goes unnoticed: the team completed the activity of research without completing the analytical work that research is supposed to perform. Data was collected. Observations were organized. A report was written. But the work of making sense of what the data actually suggests — finding the pattern, understanding why users behave that way, and connecting it to the decision the team needed to make — was either abbreviated or never seriously attempted.

The team finishes with more information. They do not finish with more clarity.

Bad UX Research generates data but fails to produce a clearer diagnosis.

One of the most common ways this happens is visible in how the final synthesis is constructed.

A team conducts eight interviews to understand how users experience their product. The researcher organizes the report around the topics covered in each session: onboarding, the main dashboard, the notification system, pricing. Each section summarizes what participants said. The onboarding section notes that users found the initial steps manageable but became uncertain at the account setup stage. The dashboard section notes that users mentioned feeling overwhelmed by the number of options available.

The report is accurate. It describes what happened in the interviews. But look at what the synthesis has not produced. It has not explained why users become uncertain at account setup — whether they lack information, encounter an unexpected decision point, or find a gap between what they expected the product to require and what it actually does. It has not examined what “feeling overwhelmed” actually means in context — whether the problem is volume, visual hierarchy, unclear labeling, or a mismatch between the dashboard’s structure and the user’s mental model of what the product is for.

The synthesis mirrored the discussion guide. The analysis stopped at description. What came out of the research was an organized account of what users said. Not an explanation of what their experience means.

This failure is often preceded by a particular kind of brief.

The research objective is stated as something like: We want to understand our users better. The intent is genuine. The investment is real. But the objective contains no specific uncertainty to resolve, no decision the research is supposed to inform, no competing hypotheses to examine.

Without a specific question, research collects information in all directions. It returns with observations about onboarding and pricing and navigation and feature preferences. None of these observations are wrong. But none of them were gathered in response to a question that needed answering — which means the team has no clear way to evaluate what the research has actually produced. The findings accumulate. The synthesis grows. And at the end, the team knows more about what their users said, without being meaningfully closer to understanding what to do next.

Research that begins without a specific decision to inform tends to return with information in all directions. What it produces is coverage, not clarity.

Sometimes the problem is not in the objective but in the sequence.

A team decides to run a usability test or a round of discovery interviews before defining what product decision the study is meant to inform. But without a defined target, the observations that emerge have no place to land. They are cited in the report, available to reference, but not truly available to inform anything.

Method chosen before question produces data before purpose. The research happened. The thinking that should have preceded it did not.

The most compressed version of this failure appears in a pattern that runs through many research reports: the direct move from user observation to product recommendation with nothing in between.

A participant says the dashboard feels overwhelming. The report recommends simplifying the dashboard.

The leap is short enough to feel self-evident. But consider what was missing between those two points. There is no examination of what kind of overwhelm is occurring — whether it is a question of density, hierarchy, labeling, or a mismatch between what the user expects the dashboard to do and what it actually offers. There is no hypothesis about mechanism: is the user overwhelmed because there is too much information, because the most relevant information cannot be located, or because the interface assigns equal visual weight to unequal priorities? There is no implication for product: does simplification mean reducing features, restructuring information architecture, redesigning the visual system, or rethinking what the dashboard is fundamentally for?

The observation pointed in a direction. The recommendation followed. What connected them was not analysis — it was assumption dressed as conclusion.

Bad research does not bend that connection. It simply never builds it.

What all of these failures share is a confusion that is easy to fall into precisely because research produces so many visible outputs. There are notes, recordings, transcripts, themes, affinity maps, quotes, and summaries. The evidence of work is everywhere. And because the work is real, it becomes easy to treat the output of that work as equivalent to understanding.

But a recurring topic is not necessarily a pattern. A quote is not an explanation. A theme is not a diagnosis.

Description and diagnosis are different things. Most Bad UX Research produces accurate description. What it fails to produce is the step that comes after — where description becomes interpretation, where observation becomes hypothesis, where “here is what users said” becomes “here is what we now understand about the experience behind what they said.”

The Missing Middle

The reasoning that lives between evidence and decision

Here is what makes this particularly difficult to detect: Bad and Ugly research often look, at the surface level, almost identical to research that works.

The same methods. The same deliverables. The same vocabulary.

A team can conduct ten interviews and produce a thematic report organized by session topic. Another team can conduct ten interviews and produce something substantially different — not because they used a different method, or recruited more participants, or wrote a longer report, but because of what they did with the material once it was collected. A usability test can produce a list of observed friction points, or a set of hypotheses about why those friction points exist and what they mean for the product.

The artifact — the report, the journey map, the quote deck, the presentation — does not tell you which of these things happened. Neither does the method. Neither does the sample size. Neither does the sophistication of the final deliverable.

Research quality is not visible in the output. It lives somewhere else.

When you look back at the examples in this article — the validation study whose scope excluded the most important question, the brief that positioned research as confirmation rather than inquiry, the synthesis that organized observations without interpreting them, the recommendation that followed an observation without passing through analysis — something was absent from all of them.

Not interviews. Not participants. Not effort. Not a report.

What was absent was a specific kind of work: converting what was observed into something the team can reason from. Identifying what the evidence, across participants, actually supports as a pattern. Forming an interpretation of what that pattern suggests about the user’s experience. Developing a hypothesis — held carefully, not asserted — about what mechanism might explain the behavior. And translating that hypothesis into implications specific enough to inform a product decision.

In Ugly research, this work was present but distorted — bent under the weight of a conclusion the organization had already reached, selecting what confirmed and minimizing what contradicted, until the interpretation pointed only in one direction.

In Bad research, this work was absent — the investigation moved from collection to report without passing through analysis, or passed through it so quickly that nothing of substance remained on the other side.

In both cases, the team received something that research produced. What they did not receive is what research is actually capable of producing.

Call it the middle. The reasoning that lives between evidence and decision.

There is a difference between research that skips the middle, research that bends it, and research that makes it visible.

That difference is where research quality actually lives — and it is precisely what Good UX Research looks like when it works.

The Good UX Research

Making interpretation visible

What does it look like when a team builds the middle and makes it visible?

It does not always look different from the outside — the same interview length, the same report format, often the same methods. What differs is what happens between collection and conclusion, and whether that process is visible enough for someone else to evaluate, challenge, or build on.

Good research is not primarily a methodological achievement. It is a reasoning one. The team did not just produce findings. They produced an argument that can be examined.

Consider a product team grappling with a familiar problem: users are hesitating on the pricing page. The drop-off is real. The data is unambiguous. The team knows something is happening at that moment in the funnel — they do not yet know what.

A less careful approach might test alternative page designs. A somewhat better one might run interviews asking users how they experience pricing. Both produce signal. Neither is oriented toward the specific uncertainty the team actually needs to resolve.

A team working with Good UX Research begins somewhere different. They begin by naming what they do not know.

They are not sure whether users hesitate because the pricing structure is confusing, because the price point feels high relative to perceived value, or because something in the experience makes the commitment feel riskier than it should. These are different problems. They may require different solutions. The investigation begins with that named uncertainty — and is designed to test all three possibilities, not to confirm one of them.

What the research finds, across multiple sessions, is worth looking at carefully.

Users are not confused about the plans. When asked, they can articulate the differences between tiers. They understand what each includes. The comprehension is there.

What they do instead of choosing is something the team did not fully anticipate. They return to the cancellation terms. They look for guarantees. They read FAQ entries about billing. They search for reviews or third-party assessments. They ask — aloud or through their behavior — what happens if they choose wrong.

These are the observed signals. Not what users said they felt, but what they did, repeatedly, across participants.

The evidence pattern that emerges is specific. The hesitation does not correlate with uncertainty about the plans. It correlates with something else — a series of behaviors oriented not toward understanding the product but toward understanding the consequences of committing to it.

From that pattern, the team forms an interpretation. Not a conclusion — an interpretation. The hesitation may be less about plan comprehension and more about perceived commitment risk: the experience of standing at the edge of a decision where the cost of being wrong feels uncertain, and the signals that normally reduce that uncertainty — clear reversibility, evidence of safety, visible trust — are not sufficiently present at the moment of choice.

That interpretation is held as a hypothesis: supported by the evidence pattern, not confirmed by it. Other explanations remain possible, and the team says so.

The product implication is not “redesign the pricing page” but something more specific: before changing pricing structures or simplifying the plan comparison, the team may need to examine how the experience communicates reversibility and decision safety — whether earlier moments in the flow could reduce perceived risk before a user arrives at pricing at all.

The team can now trace the path from symptom to signal to pattern to interpretation to implication. Every step can be challenged. If someone disagrees with the interpretation, they can point to where and why. If new evidence contradicts the hypothesis, the team knows exactly what to revise.

This is the middle made visible.

What strikes most product teams when they encounter this kind of research is not how different it looks but how different it feels to use. There is something solid to push against. The interpretation can be questioned. The reasoning can be stress-tested. The team can ask: what would have to be true for this interpretation to be wrong? And in being able to ask that question, they learn something about how much confidence the evidence actually warrants.

Good research does not remove interpretation. It makes interpretation visible.

Bad research also contains interpretation — it is just hidden, living in the gap between observation and recommendation, never made explicit. It does not produce fewer judgments than Good research. It produces judgments that cannot be evaluated, because the team does not know where they were made.

Good research does not eliminate uncertainty. It clarifies where uncertainty still remains. After the pricing investigation, the team is not certain that perceived commitment risk is the right diagnosis. They are specific about what they believe and why, and they are specific about what would cause them to revise it. That is a fundamentally different position to reason from: more honest than false certainty, more useful than vague uncertainty.

Good research does not pretend the researcher’s perspective is absent from the analysis. It requires, instead, that interpretive choices be made visible — so that those choices can be evaluated for their quality rather than simply absorbed as findings.

The standard Good research sets for itself is not objectivity. It is traceability.

None of this requires a large study, a long timeline, or an expensive process. Five interviews that produce a visible, challengeable middle are more analytically valuable than twenty that collapse into a thematic report.

The question that determines research quality is not method, sample size, or report sophistication. It is whether the team can trace the reasoning between evidence and decision. If they can, the research is doing its job. If they cannot, something in the middle was either missing or bent — and the confidence the team places in their conclusions is, to that degree, unsupported.

A Simple Test for Research Quality

Six questions to evaluate any piece of research

These questions can be asked before acting on any piece of research — before committing to a redesign, a pricing decision, a roadmap priority, or a bet on product direction. They are not designed to evaluate the researcher’s competence. They evaluate whether the reasoning between evidence and decision is solid enough to act on.

The first is the most fundamental: what specific uncertainty was this research designed to address? If the answer is vague — to understand users better, to get a sense of what people think — the research could not have produced specific understanding. The middle had nowhere particular to go.

The second separates raw material from analysis: what did users actually say or do, before any interpretation was applied? The observation and the interpretation built on it need to be distinguishable. If they cannot be separated, the team is not evaluating evidence — they are accepting a conclusion that was never made visible.

The third asks about the step most often skipped: what pattern does the evidence support? Not which topics came up across interviews — but what the evidence, taken together, actually suggests about why users behave as they do.

The fourth names what is easy to conceal: what remains an interpretation? Somewhere between the pattern and the recommendation, a judgment was made. That judgment should be identifiable as a judgment — as something the evidence supports, not something it proves. If everything in the report reads as established fact, the interpretation has likely been presented with more certainty than the evidence warrants.

The fifth reconnects research to its purpose: how does this interpretation connect to the specific decision the team needs to make? Good research is tethered to a real product question. If the interpretation is interesting but does not connect clearly to what the team is trying to figure out, something was missing in how the investigation was framed.

And the sixth is perhaps the most diagnostic of all: what evidence would cause us to revise this conclusion? A team that cannot answer this question easily — that cannot imagine what observation would change their interpretation — should ask whether the interpretation was formed from the evidence or whether the evidence was organized to support it.

No single question guarantees anything. Together, they reveal quickly whether the middle was built, whether it is visible, and whether it can be challenged. That is the standard.

The Difference Isn’t the Method

Return to the room where this article began.

The deck is on the screen. Ten interviews. The affinity map. The journey map. The highlighted quotes. The recommendation: simplify the onboarding experience.

And the question: What pattern connects the evidence to this conclusion?

In the scenario that opened this article, nobody could answer. Not because the research was fraudulent, and not because the team was careless. Because the middle — the interpretive work between what users said and what the team concluded — was never made visible enough to be examined.

Now imagine a different version of that room.

Someone asks the same question. And the researcher can answer it.

Not by pointing to a quote. By tracing a path: here is what we observed across participants; here is the pattern that emerged; here is the interpretation we formed, and why we believe it is the most consistent explanation for what we saw; here is the product implication we drew from it; and here is what would cause us to revise it.

The team may disagree. Someone may propose an alternative explanation that fits the evidence equally well. Someone may identify an observation that was underweighted. These are not failures of the research. They are the research working as it is supposed to work — producing a visible argument that can be evaluated, challenged, and improved.

That capacity to be examined is not a secondary feature of good research. It is the point.

Research quality, in the end, is not determined by the methods used or the artifacts produced. It is determined by whether the middle was built honestly, made visible, and left open to scrutiny.

Bad research skips the middle, moving from observation to recommendation as if the path between them were obvious, leaving the team with data they cannot truly reason from, and decisions they cannot fully defend.

Ugly research fills it — but with something organized around an answer that already existed, leaving the team with confidence they have not honestly earned.

Good research builds the middle carefully, exposes it to view, and holds its conclusions with exactly as much certainty as the evidence supports — no more, and no less.

A product team that can ask whether the middle was built — and find out — has changed its relationship to what research is actually for.

The interviews happened. The report exists. The reasoning is visible.

That is the difference.