Common data analysis mistakes - and how to avoid them

  • April 13, 2021

Data doesn’t lie, but people make mistakes. Analysts are only human and, therefore, are susceptible to errors and bias that can result in misleading conclusions. Given the weight that is now given to data-driven insights, it’s vital to minimise potential errors as much as possible. Here are a few common challenges analysts face, and how to avoid making costly mistakes.

Data validation

Once you have been working as a data analyst for some time, it’s easy to forget the basics. Validating your data should never be rushed, because there is nothing worse than conducting your entire analysis, only to find the numbers do not add up. This is especially important if someone else has worked on the dataset before passing it over to you — make sure you request all the information you need, such as summary reports. A row count and key column summaries are usually a good way to start off data validation, but the exact steps will depend on your dataset. The most important thing is to have a clear plan of how you will validate your data before you even touch it — and continue to validate your data each time you make a major change to ensure the results are as you expected. Documenting your steps in this way will save you countless hours of trying to find the moment your analysis went wrong.

Theory overload

Many early career data analysts spend too much time on theory, overloading on trendy statistics and algorithms. While it’s important to keep learning the latest industry techniques, this can be a waste of time if you’re not applying them, because it’s extremely difficult to master new skills without consistent practice. Accept that you cannot learn everything and prioritise techniques that will provide you with the highest return on time-investment for the projects you are currently working on. It’s much better to master a smaller number of useful everyday techniques that have a broad but vague theoretical understanding of advanced concepts that you rarely use.

Understanding your data

Many data analysts are overly focused on applying the model, and as a result they rush some of the early stages of analysis. However, taking the time to understand the data by exploring it and applying visualisation techniques can improve the quality of your models, so it’s not a step that should be overlooked. You will also be more likely to catch unusual distributions of data, outliers, inconsistencies and/or missing values. At the most extreme end, failing to  understand your data could bias the model and lead to an incorrect conclusion. This conclusion could then be used to guide future business decisions, so it could become a very costly mistake.

Looking beyond the data

If you really want to establish yourself as someone who provides cutting-edge insights for their business, then you need to be more than just an analyst. Data doesn’t exist in a vacuum, and the conclusions you draw should take context into account. To improve your insights, try to understand key industry-specific concepts, research the wider market and stay aware of trends and developments in your area. This will improve the way you communicate the importance of your analysis to business stakeholders and help you stand out from the competition. In addition, having industry knowledge can help you avoid incorrect data conclusions. For example, ‘correlation is not causation’ is a guiding principle of statistics — meaning that you cannot assume two associated relationships are causally impacting each other. Take this very simple example: an analyst notices a correlation between ice cream sales and sunburn and incorrectly concludes that sunburn drives ice cream sales. They lack the industry specific knowledge to realise that hot weather is driving both trends. Obviously, the factors driving industry trends are many and complex, but all this means is that mistakes of this nature are much easier to make.