"We want AI in acoustic inspection – how many training samples do we need?" We answer this question weekly. Honest answer: fewer than textbooks say – if you do it right.
Common misconceptions
- "More data is always better." Wrong. Poorly structured data degrades the model.
- "We need thousands of NOK parts." In reality you rarely have that many – and do not need them.
- "AI tunes itself automatically." Only if data foundation is right.
Recommended minimum sample sizes
| Model class | OK samples | NOK samples | Note |
|---|---|---|---|
| Classical threshold | 30–100 | 5–20 | tolerance build |
| One-class (anomaly) | 200–500 | 0 (or few) | most common |
| Binary classifier | 500–1,500 | 100–500 | good balance |
| Multiclass defect model | 1,000–3,000 | 50–200 per class | defect type diagnosis |
Five principles
1. Multiple shifts and days: acoustic signals vary with temperature, hall noise, operator, tool wear. At least 3–5 different shifts in training.
2. Sensor variability covered: two parallel sensors → train on both. Replaceable sensor → train on the spare type as well.
3. Provoke NOK parts: reduce tempering temperature, leave tools dull, vary rpm. Combined with few real claim parts builds robust defect set.
4. Manual edge case labelling: the most important 5 % are border cases. Acoustic + part specialist together.
5. Holdout for validation: 20 % of data away from training, only for model evaluation.
Data collection tips
- Store every measurement with part ID, timestamp, sensor ID, operator, process parameters.
- Archive raw signals – not just features. You will want them later.
- Tag OK parts with variant metadata (colour, batch, supplier) for future root-cause analysis.