MLPC 2026 · Task 3mlpc-2026-task3

Data Exploration Report

Sound event detection for smart homes — verifying annotations, quantifying agreement, characterizing the label and feature space, and identifying biases for the modeling phase.

3,656 recordings168,239 segments15 classes47 features369 devices

Team A-C · Julian Schmidt · Paul Breburda · GitHub

What we didmlpc-2026-task3
  1. 1Verified annotations in Label Studio — spectrogram playback, side-by-side annotator comparison.
  2. 2Applied boundary rules (±1 s tolerance) to accept, fix, or reject each labeled region.
  3. 3Measured inter-annotator agreement via segment-level IoU, comparing own vs. external pairs.
  4. 4Aggregated labels with majority vote on binarized overlap; characterized the 15-class label space.
  5. 5Profiled metadata and 47-D audio features — correlation structure, t-SNE embedding, dataset biases.
Disagreement patternsmlpc-2026-task3

Agreement on verified recordings ranged from 24 % to 93 %, worst on polyphonic multi-class clips.

Systematic omission

Annotators missed over half of events in some clips, biasing classifiers toward precision over recall.

Acoustic confusion

bell_ringing / phone_ringing and door_open_close / wardrobe_drawer_open_close repeatedly swapped due to similar mechanical resonances.

Boundary disagreement

Merge vs. split near ~1 s pauses produced different region counts even when class labels matched.

Transient bias

Brief events (keychain, light_switch) missed disproportionately compared to sustained sounds (vacuum, running water).

Agreement drops monotonically with complexity — below 40 % for polyphonic clips, near 90 % for single-source recordings.

Case study · lowest agreementmlpc-2026-task3

File 002871 — two reviewers, polyphonic domestic audio. Agreement: 24.06%.

Class
door_open_close02
footsteps12
keyboard_typing11
keychain01
Total regions26

Overlapping sound classes mask each other acoustically — majority vote with only two annotators drops any event marked by exactly one reviewer.

Annotator agreement (IoU)mlpc-2026-task3
Overall mean: 0.640Own pairs: 0.705 (n=4,169)External: 0.685 (n=1,564)
IoU — intersection over unionmlpc-2026-task3
IoU=ABAB\mathrm{IoU}=\frac{|A \cap B|}{|A \cup B|}
IoU = 0.136

Segment-level agreement pairs binary masks per class; this 1D toy model is the same intersection-over-union idea on two intervals.

Label aggregationmlpc-2026-task3
yt,c=1[1Aa=1A1[annt,c,a0.5]0.5]y_{t,c}=\mathbf{1}\left[\frac{1}{A}\sum_{a=1}^{A}\mathbf{1}[\mathrm{ann}_{t,c,a}\ge 0.5]\ge 0.5\right]

Toggle each annotator's binarized vote for one segment/class pair, then watch the majority label flip at the threshold line.

Vote threshold
Annotators
Mean vote
0.667
Aggregated label
1

~17% of files have a single annotator — those labels pass through directly after binarization, without majority arbitration.

Label spacemlpc-2026-task3

Footsteps dominates (~15.3 %); light_switch is rarest (~0.6 %) — a 24:1 ratio. Multi-label output requires per-class sigmoid heads, not softmax.

Recording metadatamlpc-2026-task3
3,656 recordingsMedian duration 22.5 s369 devices

Kitchen-heavy skew (~26 %) mechanically over-represents kitchen-associated classes.

Feature scalesmlpc-2026-task3
Power ranges to 11,140 — Flatness stays in [0,1]

Power ranges to 11,140 while flatness and ZCR stay below 1. Toggle to z-scores — per-feature normalization is essential before any distance-based classifier.

Feature correlationsmlpc-2026-task3
MFCCdMFCCd2MFCCMelSpectEnergyPowerZCRFluxFlatnessCentroidBandwidthContrastRollLowRollHighMFCCdMFCCd2MFCCMelSpectEnergyPowerZCRFluxFlatnessCentroidBandwidthContrastRollLowRollHigh

Energy–flux r = 0.93; MFCC–log-mel r = 0.82. ZCR, centroid, bandwidth, and rolloff form a spectral-shape cluster (r > 0.6). Delta/delta-delta MFCCs are nearly independent — a temporal-dynamics subspace.

Conclusionsmlpc-2026-task3

Dataset biases

  • Hardware: iPhone-heavy long tail — mic and noise-floor shift on unseen devices.
  • Environment: kitchens ~26 % — kitchen-associated classes over-represented.
  • Annotator: own-recording pairs +0.020 IoU — collector familiarity leaks into labels.
  • Class frequency: 24:1 max : min — reweight or resample for balanced training.

Recommendations

  • Class-weighted BCE + focal loss for rare transients (light_switch, bell_ringing).
  • Filter segments with class-level IoU ≥ 0.6 to trim label noise.
  • Group-stratified splits by collector_id (+ device) to prevent leakage.
  • Per-class sigmoid heads (multi-label); targeted augmentations on confused pairs.
Broader impact & toolingmlpc-2026-task3

Applications

  • +Assistive listening for hearing-impaired users
  • +Elderly care monitoring — fall detection, medication reminders
  • +Energy-efficient smart home automation via acoustic context

Privacy risks & mitigations

  • !Domestic audio is intimate — prefer on-device processing so raw audio never leaves the home
  • !Informed consent protocols for all deployed systems
  • !Restrict outputs to predefined event categories, not open-ended audio analysis
  • !Publish aggregated features, not source waveforms

AI disclosure: Claude Opus 4.6 used for analysis code, LaTeX editing, and this deck. All quantitative claims verified against raw data. Source code

MLPC 2026 · Task 3