Rong GONG's music/speech processing blog: Optimizing DTW-based audio-to-MIDI alignment and matching, Colin Raffel paper

Saturday, November 25, 2017

Optimizing DTW-based audio-to-MIDI alignment and matching, Colin Raffel paper

This paper introduced a method of optimizing various DTW parameters on a synthetic MIDI dataset. He optimized the mean absolute alignment error by Bayesian optimization and the confidence score by exhaustive search.

Some interesting points in the paper:
(1) The best alignment systems don't use beat-synchronous feature.

(2) He introduced two penalties. The first one to penalize the "non-diagonal move", the second to ensure the entire subsequence is used when doing subsequence alignment. Best systems use median values for both penalties.

(3) The synthetic midi method includes change tempo, crop midi segment, delete the vocal track, change instrument timbre and change velocity. All is done by pretty_midi.

(4) He evaluated the matching confidence score by calculating the Kendell rank correlation between the score and the alignment absolute error, which means the error is the ground truth matching confidence score.

(5) All of the systems achieved the highest correlation when including the penalties in the score calculation, normalizing by the path length, and normalizing by the mean distance across the aligned portions.

Rong GONG's music/speech processing blog

Saturday, November 25, 2017

Optimizing DTW-based audio-to-MIDI alignment and matching, Colin Raffel paper

No comments:

Post a Comment

social network

Total Pageviews

Subheader

Blog Archive