Rong GONG's music/speech processing blog: December 2017

Wednesday, December 27, 2017

Capacity and trainability of different RNNs

In the paper "Capacity and trainability in RNNs": https://arxiv.org/pdf/1611.09913.pdf

The author claims that all common RNNs have similar capacity. The Vanilla RNN is super hard to train. If the task is hard to learn, one should choose gated architectures, in which GRU is the most learnable for shallow networks, +RNN (Intersection RNN) performs the best for the deep networks. Although LSTM is extremely reliable, it doesn't perform the best. If the training environment is uncertain, the author suggests using GRU or +RNN.

Another paper "On the state of the art of evaluation in neural language models" https://arxiv.org/pdf/1707.05589.pdf The authors also found that the standard LSTM performs the best among 3 different architectures (LSTM, Recurrent highway networks and Neural architecture search). The models are trained using a modified ADAM optimizer. Hyperparameters including learning rate, input embedding ratio, input dropout, output dropout, weight decay, are tuned by batched GP bandits.

It is also shown that, in the Penn Treebank experiment, for the recurrent state, the variational dropout helps, the recurrent dropout indicates no advantage.

Sunday, December 24, 2017

Deep learning practice and trends, some key points

I went through the 1st part of the tutorial: practice. Below are some key points in Oriol's talk:

CNN:

(1) The slide 7 Deep learning: zooming in is amazing! He listed the deep learning model construction elements and sorted them into different categories: Non-linearities, Optimizer, connectivity pattern, loss and hyper-parameters.

(2) The slide 21 which shows the convolution animation is great! very intuitive to understand the convolution mechanism.

(3) Slide 27 building very deep ConvNets: using deeper architecture and small filter size 3*3 will result in a large receptive field and less parameter size than using large filters.

(4) Slide 35 U-net: for image segmentation, bottleneck encoder-decoder with skip connection.

Seq2seq:

(1) Attention!

(2) Slide 62: tricks!

Video:

Slides: https://docs.google.com/presentation/d/e/2PACX-1vQMZsWfjjLLz_wi8iaMxHKawuTkdqeA3Gw00wy5dBHLhAkuLEvhB7k-4LcO5RQEVFzZXfS6ByABaRr4/pub?slide=id.g2a19ddb012_0_654

Rong GONG's music/speech processing blog

Wednesday, December 27, 2017

Capacity and trainability of different RNNs

Sunday, December 24, 2017

Deep learning practice and trends, some key points

social network

Total Pageviews

Subheader

Blog Archive