MLPC 2026 · Task 3mlpc-2026-task3

Data Exploration Report

Sound event detection for smart homes — verifying annotations, quantifying agreement, characterizing the label and feature space, and identifying biases for the modeling phase.

3,656 recordings168,239 segments15 classes47 features369 devices

Team A-C · Julian Schmidt · Paul Breburda · GitHub

What we didmlpc-2026-task3

1Verified annotations in Label Studio — spectrogram playback, side-by-side annotator comparison.
2Applied boundary rules (±1 s tolerance) to accept, fix, or reject each labeled region.
3Measured inter-annotator agreement via segment-level IoU, comparing own vs. external pairs.
4Aggregated labels with majority vote on binarized overlap; characterized the 15-class label space.
5Profiled metadata and 47-D audio features — correlation structure, t-SNE embedding, dataset biases.

Disagreement patternsmlpc-2026-task3

Agreement on verified recordings ranged from 24 % to 93 %, worst on polyphonic multi-class clips.

Systematic omission

Annotators missed over half of events in some clips, biasing classifiers toward precision over recall.

Acoustic confusion

bell_ringing / phone_ringing and door_open_close / wardrobe_drawer_open_close repeatedly swapped due to similar mechanical resonances.

Boundary disagreement

Merge vs. split near ~1 s pauses produced different region counts even when class labels matched.

Transient bias

Brief events (keychain, light_switch) missed disproportionately compared to sustained sounds (vacuum, running water).

Agreement drops monotonically with complexity — below 40 % for polyphonic clips, near 90 % for single-source recordings.

Case study · lowest agreementmlpc-2026-task3

File 002871 — two reviewers, polyphonic domestic audio. Agreement: 24.06%.

Class
door_open_close	0	2
footsteps	1	2
keyboard_typing	1	1
keychain	0	1
Total regions	2	6

Overlapping sound classes mask each other acoustically — majority vote with only two annotators drops any event marked by exactly one reviewer.

Annotator agreement (IoU)mlpc-2026-task3

Overall mean: 0.640Own pairs: 0.705 (n=4,169)External: 0.685 (n=1,564)

IoU — intersection over unionmlpc-2026-task3

\mathrm{IoU}=\frac{|A \cap B|}{|A \cup B|}

IoU = 0.136

Segment-level agreement pairs binary masks per class; this 1D toy model is the same intersection-over-union idea on two intervals.

Label aggregationmlpc-2026-task3

y_{t,c}=\mathbf{1}\left[\frac{1}{A}\sum_{a=1}^{A}\mathbf{1}[\mathrm{ann}_{t,c,a}\ge 0.5]\ge 0.5\right]

Toggle each annotator's binarized vote for one segment/class pair, then watch the majority label flip at the threshold line.

Vote threshold

Annotators

Mean vote

0.667

Aggregated label

~17% of files have a single annotator — those labels pass through directly after binarization, without majority arbitration.

Label spacemlpc-2026-task3

Footsteps dominates (~15.3 %); light_switch is rarest (~0.6 %) — a 24:1 ratio. Multi-label output requires per-class sigmoid heads, not softmax.

Recording metadatamlpc-2026-task3

3,656 recordingsMedian duration 22.5 s369 devices

Kitchen-heavy skew (~26 %) mechanically over-represents kitchen-associated classes.

Feature scalesmlpc-2026-task3

Power ranges to 11,140 — Flatness stays in [0,1]

Power ranges to 11,140 while flatness and ZCR stay below 1. Toggle to z-scores — per-feature normalization is essential before any distance-based classifier.

Feature correlationsmlpc-2026-task3

Energy–flux r = 0.93; MFCC–log-mel r = 0.82. ZCR, centroid, bandwidth, and rolloff form a spectral-shape cluster (r > 0.6). Delta/delta-delta MFCCs are nearly independent — a temporal-dynamics subspace.

Conclusionsmlpc-2026-task3

Dataset biases

Hardware: iPhone-heavy long tail — mic and noise-floor shift on unseen devices.
Environment: kitchens ~26 % — kitchen-associated classes over-represented.
Annotator: own-recording pairs +0.020 IoU — collector familiarity leaks into labels.
Class frequency: 24:1 max : min — reweight or resample for balanced training.

Recommendations

Class-weighted BCE + focal loss for rare transients (light_switch, bell_ringing).
Filter segments with class-level IoU ≥ 0.6 to trim label noise.
Group-stratified splits by collector_id (+ device) to prevent leakage.
Per-class sigmoid heads (multi-label); targeted augmentations on confused pairs.

Broader impact & toolingmlpc-2026-task3

Applications

+Assistive listening for hearing-impaired users
+Elderly care monitoring — fall detection, medication reminders
+Energy-efficient smart home automation via acoustic context

Privacy risks & mitigations

!Domestic audio is intimate — prefer on-device processing so raw audio never leaves the home
!Informed consent protocols for all deployed systems
!Restrict outputs to predefined event categories, not open-ended audio analysis
!Publish aggregated features, not source waveforms

AI disclosure: Claude Opus 4.6 used for analysis code, LaTeX editing, and this deck. All quantitative claims verified against raw data. Source code

MLPC 2026 · Task 3