"We want AI in acoustic inspection – how many training samples do we need?" We answer this question weekly. Honest answer: fewer than textbooks say – if you do it right.

Common misconceptions

  • "More data is always better." Wrong. Poorly structured data degrades the model.
  • "We need thousands of NOK parts." In reality you rarely have that many – and do not need them.
  • "AI tunes itself automatically." Only if data foundation is right.

Recommended minimum sample sizes

Model classOK samplesNOK samplesNote
Classical threshold30–1005–20tolerance build
One-class (anomaly)200–5000 (or few)most common
Binary classifier500–1,500100–500good balance
Multiclass defect model1,000–3,00050–200 per classdefect type diagnosis

Five principles

1. Multiple shifts and days: acoustic signals vary with temperature, hall noise, operator, tool wear. At least 3–5 different shifts in training.

2. Sensor variability covered: two parallel sensors → train on both. Replaceable sensor → train on the spare type as well.

3. Provoke NOK parts: reduce tempering temperature, leave tools dull, vary rpm. Combined with few real claim parts builds robust defect set.

4. Manual edge case labelling: the most important 5 % are border cases. Acoustic + part specialist together.

5. Holdout for validation: 20 % of data away from training, only for model evaluation.

Data collection tips

  • Store every measurement with part ID, timestamp, sensor ID, operator, process parameters.
  • Archive raw signals – not just features. You will want them later.
  • Tag OK parts with variant metadata (colour, batch, supplier) for future root-cause analysis.