Rong GONG's music/speech processing blog: January 2018

Sunday, January 14, 2018

Sheet music and audio multimodal learning

https://arxiv.org/abs/1612.05050

Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.

https://arxiv.org/abs/1707.09887

Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?

Wednesday, January 3, 2018

If I were to write this paper... Drum transcription CRNN

https://ismir2017.smcnus.org/wp-content/uploads/2017/10/123_Paper.pdf

(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.

(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.

(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.

Rong GONG's music/speech processing blog

Sunday, January 14, 2018

Sheet music and audio multimodal learning

Wednesday, January 3, 2018

If I were to write this paper... Drum transcription CRNN

social network

Total Pageviews

Subheader

Blog Archive