Data Quality is an important pillar of Market Research (MR). The goal of MR is to assist users, brands, researchers, and any other stakeholder to help their cause to achieve its goals. If the quality of the underlying data is poor, then the results and business outcomes will poor as well.

As such, it is imperative that Data Quality be treated as core to the industry. Data Quality is as much a question of practical reality as it is a question of ethics. Tia Maurer of P&G recently publicly shared, via the CASE initiative, a few real-life stories of multi-million-dollar investments which were undermined by poor data quality.

In many ways, Data Quality is an abstract topic – just like ideas, opinions, predictions are also abstract and cannot be defined as ‘good’ and ‘data’, data quality is contextual also.

We will all most probably agree with what bad data quality is – I would define bad data quality as a practice wherein the subject (i.e. respondent or data source) is disinterested or disengaged in the research but is taking the survey purely for some alternative reason. Objectively, we will also define that any gathering of ‘opinions’ through click farms, bots, and other kinds of poor sources is bad data quality.

Subjectively, we can also agree that anyone who is taking the survey because they were forced into it, or because they felt obligated for some reason, or only for the incentive is likely also going to be inclined to give poor data either on purpose or situationally speaking. In short, we see bad data quality is situations where the subject (i.e. respondent or data source) is disinterested in the research but is either taking the survey purely for some alternative reason.

In a case where we have an issue which is both objective and subjective, then we need multiple aspects to fight fraud. While we can call agree on what bad data quality is, it is harder to agree on what good data quality is. However, we can take some specific steps in that direction. Typically, a combination of the below gets you a good result – ‘good results’ being truthful and usable research data.

  1. We definitely want to get rid of bots, bad actors, fraudsters, dupes and professional takers. This should be Step 1.
  2. Good survey design – as much as the ‘supply side’ is responsible for Data Quality the researchers, designers and programmers are too.

As a practical matter of fact, in running regression and correlation analysis with respondents who were scrubbed out of surveys, we have found that bots actually “performed better” than respondents who are disengaged or simply frustrated with the experiences.
In other words, if respondents are frustrated with the survey, the outcome will not be elegant, regardless of what they are paid.

  1. Red Herrings, attention checkers and other types of qualitative analysis are also powerful. The simple questions that more designers are used to, do take us far – and, there are also a few other initiatives coming through which use machine learning (ML) to ‘cross reference’ answers within a survey, or even across surveys to establish consistency.

The next topic that is debated is pricing and incentivization. First, let us talk about incentivization itself. Incentivization is hugely important in research, and in life for that matter. After all, we all do our greatest work when we have interest in the subject AND when there is a fair incentive for that interest. In fact, I would argue that most researchers and supply partners (digital or physical) would be willing to allocate more of the research spend to more engaged respondents and data subjects. So, in my opinion, incentives must continue to be part of story here. I myself take part in surveys, especially B2B surveys because of my intention to share opinions, and curiosity of what researchers are interested in and thinking about, and because many companies do pay what I find to be a fair dollar amount in terms of response.

With regards to the effect of incentivization on research, there are a few competing ideas as to what causes or creates bad vs. good data. On one hand, we have an idea that correlates fraudsters and professional survey takers with $. The other theory is that humans are incentivized by payouts, and attention is not a commodity. It is an asset that needs to be paid for. And as such, we get better responses when the human being is attentive.

Both are true. The simple to pose, but difficult to answer, question is – how do we keep the fraudsters / bots away, but incentivize the good, engaged folks to share thoughtful answers with us?

That is the million-dollar, or should we say, the 10 dollar question.

Vignesh Krishnan

Sample Chain