Rong GONG's music/speech processing blog: 2018

Tuesday, February 20, 2018

Retrieve the final hidden states output of variable length sequences in Tensorflow

Assume you have an input batch which contains variable length sequences. The batch dimension is:

input: [batch_size, max_time, dim_feature]

and you also stored the length of each sequence in a vector, say sequence_length. Now you can easily get the states output by:

_, state = tf.nn.dynamic_rnn(some_RNN_cell, input, sequence_length=sequence_length)

then you can get both the hidden and cell states output:

state.h: hidden states output, [batch_size, hidden_states_size]
state.c: cell states output

I give credit to these two sources:
https://danijar.com/variable-sequence-lengths-in-tensorflow/
https://github.com/shane-settle/neural-acoustic-word-embeddings/blob/4cc3878e6715860bcce202aea7c5a6b7284292a1/code/lstm.py#L25

Sunday, January 14, 2018

Sheet music and audio multimodal learning

https://arxiv.org/abs/1612.05050

Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.

https://arxiv.org/abs/1707.09887

Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?

Wednesday, January 3, 2018

If I were to write this paper... Drum transcription CRNN

https://ismir2017.smcnus.org/wp-content/uploads/2017/10/123_Paper.pdf

(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.

(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.

(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.

Rong GONG's music/speech processing blog

Tuesday, February 20, 2018

Retrieve the final hidden states output of variable length sequences in Tensorflow

Sunday, January 14, 2018

Sheet music and audio multimodal learning

Wednesday, January 3, 2018

If I were to write this paper... Drum transcription CRNN

social network

Total Pageviews

Subheader

Blog Archive