Errors in the data

errors in the data

Before a consultant can start on serious analysis of your data, it is vital to be confident that the data are "clean". There is often some confusion about what this means.

Errors can arise from:

  • data entry mistakes
  • deviations from a study protocol

Data entry errors might arise from a simple slip in typing that can be picked up by careful checking once the data are entered. Gross errors can be easy to detect. However other errors may not be so apparent. You might (and people have) record the results from one person (or case) for the next person in the data file; this is easy to do if you are transcribing results from one source to another. You might record the results for one variable in the column meant for another variable, or even the same variable in two different columns. These errors can be (and have been) identified once data analysis is underway; however it is better if they are avoided by checking your data entry as you proceed.

Sometimes errors arise from known deviations from a study protocol. A biological sample might have become contaminated, for example. Such cases are generally not included in the data set.