The Most Frequent Customer Question
"How much training data do I need?" – we hear this question in virtually every project. The answer is inherently complex, but after more than 200 completed projects we can provide reliable recommendations.
Ground Rule: Quality Over Quantity
More data only helps if it represents the real variance of your production. 500 samples from a single shift with identical machine parameters are worth less than 200 samples distributed across different batches, shifts and temperature profiles.
Recommended Minimum Quantities
- OK samples: At least 200, ideally 500+, distributed across at least 3 production lots
- NOK samples: At least 30 per defect class, ideally 50+, with representative severity distribution
- Borderline cases: 10–20 samples in the gray zone between OK and NOK significantly improve selectivity
Data Collection Across Multiple Shifts
Manufacturing processes are subject to systematic variations: tool wear, temperature drift, batch differences in raw material. A robust classifier must understand this variance. Our recommendation:
- Collect data over at least 5 production days
- Include early, late and night shifts
- Document material batch changes and capture as metadata
- Account for seasonal effects (temperature, humidity)
Labeling Strategy
Label quality is at least as important as data volume. Implement a four-eyes principle: two independent inspectors evaluate each specimen. In case of discrepancies, a third expert decides. This costs more time initially but prevents systematic errors in the model.
Conclusion
Good training data is the foundation for reliable acoustic classifiers. Invest time in a thoughtful collection strategy – the return comes quickly through lower pseudo-scrap rates and higher detection reliability in series testing.