Responses figures, also referred to as "sample sizes", can provide valuable insight into the robustness of your audiences and analyses. Generally speaking, larger sample sizes are more robust than smaller ones, but there are other factors to be mindful of too (more on that below).
When building audiences, it’s important to keep an eye on the sample size to ensure you're not relying on just a small handful of respondents to make big strategic decisions.
What’s the minimum sample size I can use?
There are no “golden” rules when it comes to sample sizes because it all depends on the the data being considered and size of the real-world population being represented. Additionally, it’s important to differentiate between the size of a sample that saw a question and the number of people within that sample who selected a particular option within it. For example, if 1,000 respondents answer a question and only 50 select one of the options within it, it doesn’t mean that the result for this option isn’t robust. Rather, it suggests that it’s a low incidence behavior that isn’t particularly popular or established. The result in itself is robust, but it might not be advisable to undertake detailed profiling of the 50 respondents who selected this option.
When analyzing behaviors at a country level, most statisticians would likely agree that you'd want no less than 1,000 as an overall sample size for robust results. When looking at audiences within a country, we say 100 is a suitable minimum. We'd always advocate using more wherever possible, but anywhere between 100-1000 can produce results that are meaningful; the higher you go within this 100-1000 range, the more robust your results will be.
Why’s the sample size for my audience so small and how can I make it bigger?
If the sample size for your audience is too small, ask yourself the following:
How many waves do I have selected? Users often select the most recent wave and conduct their whole analysis using just one wave of data. However, in most cases, we’d recommend using a full year of data. For quarterly datasets like GWI Core and GWI USA, this means selecting the last four waves. When working with particularly niche audiences, using two or even three years of data is acceptable.
Are the datapoints I’m using present across all selected waves? New questions may not yet have a full year of data behind them, making it harder to achieve a robust sample size. If this is the case, check the legacy folder to see if the question has a suitable predecessor that can be used alongside it.
Are the datapoints I’m using present across all selected markets? Some questions aren’t asked in all markets. A simple remedy here can be excluding those markets from your analysis and selecting more waves. Alternatively, you can look for a similar relevant question that is asked in the missing markets and add it to your audience. For example, in GWI Core, although we don’t ask about automotive brands in Ghana, Kenya, Morocco and Nigeria, we do ask about vehicle fuel type. Therefore, “Tesla” could be combined with “Electric” to create an audience that can be used across all markets.
Am I using AND and OR effectively? People don’t always need to meet both of the criteria you have joined by an AND to be relevant to your analysis; in such instances, consider using an OR instead. Alternatively, if you have multiple criteria joined by an AND, consider using the “at least” function to capture people who meet at least a set number of criteria instead.
Have I tried using a combination of behavioral and attitudinal data points? Supplementing a behavioral datapoint with an attitudinal one can be an effective way of broadening your audience; for example, social media users who follow politicians OR social media users list politics as an interest.