Collecting Data

Collecting data for analysis is more than a statistical process. All of the math in the world will not compensate for not understanding the behavior of the process you are trying to measure.  Not everything is settled in numbers.  Some things will be discovered in context.  For example, “We really have problems when it is raining.”

 As a result, data collection plans embody four qualities of collected data that are essential to optimize its usefulness. These qualities have to do with the data’s ability to represent the process’ performance.


  • There must be sufficient data to see the process’ behavior.
  • The data must be relevant.
  • The data must be representative of the process’ normal operating conditions.
  • The data must be contextual.



There must be sufficient observations to see patterns of variation and shifting central tendency in the process’ output. As part of building a data collection plan, the team will seek to understand the process’ history so that all expected sources of variation are captured.

Consideration must also be give to the size of the performance gap that the team is trying to measure. As the size of the gap gets smaller, the number of samples needed to measure the gap, with statistical confidence, increases.


 The data must be relevant to the problem that is being investigated. For example, if a process associated with back injuries is being analyzed, data regarding the availability of safety glasses will likely not be relevant. The central question or objective behind the data collection plan will be to point to what data needs collected.

 The data must also be relevant to an important business metric. Since data collection is an expensive process, the project team should give due diligence to verifying the relevance of the data that they want to collect. The buy-in of stakeholders and process owners will waver if they discover that the team’s focus has drifted away from the central core of the project.


 The data must represent the entire range of actual operating conditions of the process. For example, if checkout cycle times are being studied, data must representative of all levels of customer loading.

 Operating conditions can include a multitude of factors. Some examples are the time of day, sales or promotions, experience of employees, changes in process inputs, and so forth. The smart project team will brainstorm a list of the potential factors that must be considered when building the data collection plan.


 Contextual information pertains to conditions that surround, but are not part of, the process and can affect its performance. By collecting this information, we add relevance to the data. For example, if the checkout cycle time was longer than usual on a given day, you may also wish to know how many cashiers were on duty, what the customers were buying, and weather conditions. This sheds light on how the process behaves under various conditions.


 To keep cost down and improve the story telling ability of data, a comprehensive data collection plan will be needed. Process owner participation will improve the quality of the plan. Owners of peripheral processes will also make a valuable contribution since they are not directly involved in the process improvement effort (forest or trees effect).

3 thoughts on “Collecting Data

  1. Thanks for informative post. I am pleased sure this post has helped me save many hours of browsing other similar posts just to find what I was looking for. Just I want to say: Thank you!

  2. Hey There. I discovered your blog using msn. This is a very well written article. I’ll make sure to bookmark it and come back to learn extra of your helpful info. Thanks for the post. I’ll definitely return.