The Most Frequent Customer Question

"How much training data do I need?" – we hear this question in virtually every project. The answer is inherently complex, but after more than 200 completed projects we can provide reliable recommendations.

Ground Rule: Quality Over Quantity

More data only helps if it represents the real variance of your production. 500 samples from a single shift with identical machine parameters are worth less than 200 samples distributed across different batches, shifts and temperature profiles.

Recommended Minimum Quantities

  • OK samples: At least 200, ideally 500+, distributed across at least 3 production lots
  • NOK samples: At least 30 per defect class, ideally 50+, with representative severity distribution
  • Borderline cases: 10–20 samples in the gray zone between OK and NOK significantly improve selectivity

Data Collection Across Multiple Shifts

Manufacturing processes are subject to systematic variations: tool wear, temperature drift, batch differences in raw material. A robust classifier must understand this variance. Our recommendation:

  • Collect data over at least 5 production days
  • Include early, late and night shifts
  • Document material batch changes and capture as metadata
  • Account for seasonal effects (temperature, humidity)

Labeling Strategy

Label quality is at least as important as data volume. Implement a four-eyes principle: two independent inspectors evaluate each specimen. In case of discrepancies, a third expert decides. This costs more time initially but prevents systematic errors in the model.

Conclusion

Good training data is the foundation for reliable acoustic classifiers. Invest time in a thoughtful collection strategy – the return comes quickly through lower pseudo-scrap rates and higher detection reliability in series testing.