Blog

Data Fusion Techniques; Breaking down the fusion confusion

25 January, 2022 |

Pureprofile

By Johnny Caldwell, FMRS, Senior Director – Partnerships EMEA & US

It might sound a bit scary to the uninitiated, a technique reserved for data scientists and the Googles and Metas of the analytics world but quite simply in quantitative research we fuse data all the time.

Think of the frequency with which we apply weighting to our data sets using pre-accepted variables, segmentation or cluster analytics in order to produce more accurate representative outputs.

A popular and commonplace example is when we apply nationally representative (nat rep) weighting to a project’s results. The specific market’s nat rep figures – the weighting matrix – are accepted as true being as they are derived from official governmental Office for National Statistics figures. The matrix is applied and we re-align the emphasis of the demographic groups lacking in our sample in order to produce a truly representative final data set.

Consequently, Data Fusion can be explained as the integration of multiple data sources in order to produce more accurate and consistent outputs than would be provided by the original individual source.

Looking beyond the original data

If we go up a level, we might want to start layering the data and enhancing it with other useful information obtained outside of the original survey.

Obviously, if you are the owner of or have access to a traditional double opt-in panel this is in many ways easily achievable by simply building a data bank of behaviours on the audience as a whole or a subset thereof. However, it must be noted that the sample has to live in a well curated traditional panel environment – programmatic or river sampling by their transitory nature just won’t work.

Think of the vast amount of screening questions traditional panels ask. These are not just during the initial recruitment process but continuously throughout the life of a panellist in order to establish incidence rates and improve targeting. Because we know exactly who each individual respondent is and we’re not revealing any PII (Personal Identifiable Information), this information can easily be appended to survey data and so enhance the output.

The data sets are endless

We’re not just talking about demographics here. Many clients do however request that these be supplied allowing for a reduced survey length and consequently additional time to concentrate on more intrinsic subject related questioning.

The options are numerous and depending on exactly what the panel owner has previously asked they can include:

Psychographic and lifestyle data
Service and product ownership
Environmental and ethical opinion
Political and future voting persuasion etc.

Up another level and we can source more passive categories of data from our panel audience by utilising a fair data exchange methodology. Completely GDPR compliant and strictly with the full permission of our respondent, there already exists many technologies that now allow us to obtain browsing, app usage, transactional and geo-tracking data that again, because we know exactly who the originator is, can be appended to any related survey data.

But what happens if the data is from an external source?

Consequently not produced by the same individuals who have answered our survey. Fear not, as long as we have a decent number of identical data variable keys such as full demographics or psychographics the supplementary information can be fused with our original survey data set and still produce extremely useful actionable insights that might not have been formerly possible to obtain.

A great example of this is if you were lucky enough to obtain current ‘Touchpoints’ data produced by the IPA (The Institute of Practitioners in Advertising). This consumer behaviour database was created to meet the needs of the communications industry and offers unique insights into daily life and media usage across the United Kingdom.

The beauty of a data set like this is that it is designed by market research practitioners and so contains a lot of the familiar data keys common throughout the majority of online surveys meaning that both cohorts can easily be fused and again produce a great resource rich in granular detail. Also take a look at the TGI Survey (Target Group Index) originally created in 1969 by BMRB and now facilitated by Kantar.

Tackling the Big Data beast

Now up yet another level and this is where things start to become a bit more problematic and we have to take a completely different view. Let’s look at Big Data and how it can provide a more sophisticated model by pulling in intelligence derived from more disparate origins.

The subject, like the data, is vast and far too intricate for me to give it any explanatory justice here. We are producing more of this type of data than ever and it’s growing day by day.

However, it very rarely comes with a neat set of variables that market researchers can work with, we don’t often see full demographics. More likely the data is unstructured with no defining rules (a tweet, a photo, a video), this is where AI, Machine Learning and similar technologies come in order to help us to understand exactly what it means.

Having said that there are a host of other Big data sources that do offer some or limited degrees of structured data. If you have the right systems in place, analysed on their own you are looking for patterns in the connections that are formed, made accurate by the sheer volume of information you have to hand.

You can create at least some accord by attempting to source this type of semi-structured Big data. You might be able to get hold of wider/higher level unique identifiers like regional postcodes, location by output Level or super output Level, voting history, product purchase, geo-location etc. The fact that the data is so vast means that the veracity is statistically more reliable and you can at least fuse it with your own survey data and see what it throws up.

It’s always better to reverse engineer rather than retrofit

Try finding an interesting Big data source you’d like to use first before writing your questionnaire, then include the relevant data key variables across the board for an easier match.

Additionally, again depending on what relational database platforms and systems you have to hand start small, ideally with the most recently generated batch of data.

That all said, remember always try and obtain the most up to date data set you can get hold of and always be aware of the four V’s in data science: Volume (what’s the scale), Velocity (is it real time), Veracity (how trustworthy is it) and Variety (how many variables).

Continue Exploring

More data, insights and media from the experts

In the news

Survey reveals confronting impact of cost-of-living crisis on kids

Originally published: News.com.au, 11 October 2024 The confronting impact of Australia’s cost-of-living crisis on children has been laid bare in a new survey commissioned by national child protection charity, Act for

17 October, 2024

In the news

Rural Road Safety Month: Australian Road Safety Foundation encourages safe driving following stark survey

Originally published: The Geraldton Guardian, 01 October 2024 The Australian Road Safety Foundation is encouraging motorists travelling through the Mid West-Gascoyne this school holidays to be vigilant on the roads, as

10 October, 2024

In the news

Charity begins … in North American skies

Originally published: Corporate Jet Investor, 06 October 2024 So, you’ve just bought your new business jet (cheaper models are available), what next? Many years of safe, secure and luxurious private and

8 October, 2024

Solutions

Featured Services

Products

Featured Services

Data Quality Commitment

Research Technology

About

Investor Relations

Insights & Impact

Blog

Data Fusion Techniques; Breaking down the fusion confusion

Looking beyond the original data

The data sets are endless

But what happens if the data is from an external source?

Tackling the Big Data beast

It’s always better to reverse engineer rather than retrofit

Continue Exploring

More data, insights and media from the experts

Survey reveals confronting impact of cost-of-living crisis on kids

Rural Road Safety Month: Australian Road Safety Foundation encourages safe driving following stark survey

Charity begins … in North American skies

SIGN UP NOW

Stay up to date with our latest news, insights and trends reports

about us

Products

our offices

Get news, insights & trends

Take surveys & be rewarded

ISO 20252:2019

ISO 27001:2022

SIGN UP NOW

Stay up to date with our latest news, insights and trends reports