1. J. Foote Novelty curve
The first segmentation method is proposed by J. Foote in his article "Automatic audio segmentation using a measure of audio novelty". He use a diagonal matrix which he called it "checkerboard" kernel to compute the Novelty of MFCC selfsimilarity matrix. The implementation of this method is really easy and the function Mirnovelty in Mirtoolbox can do this task by several code lines.
However, the result of this method is far from perfect. We can hardly do the next step processing to segment each section of the song from the novelty curve.
Fig. 1 Novelty curve of a pop song, red vertical lines are the ground truth of segmentation parts 
2. Goto's RefraiD
The fact of this poor result of novelty curve made me realized that segmenting precisely each section is quite a hard task, and considering of the current segmentation method can reach just 0.7 Fmeasure correctness, I decide to use a method which might deal with an easier task but can achieve a more robust result.
The method I found is called "RefraiD" which introduced by Masataka Goto in his article "A Chorus Section Detection Method for
Musical Audio Signals and Its Application
to a Music Listening Station". This method extract repetitive sections by detecting the 'stripe' in the Chroma selfsimilarity matrix.
3. General process of RefraiD
The method is no need to repeat here, because Goto explained it precisely in his article. I want to point out here some modifications of this method in my implementation to improve the detection performance. Below is the general process of this algorithm:
1. STFT of audio signal
2. calculate chromagram by chroma filter
3. lag time selfsimilarity matrix (SSM) of chromagram
4. remove noise to highlight repetitive line segments in SSM, denoised SMM
5. extract line segments from denoised SSM
6. integrate repeated sections (line segments) into groups
7. redetect missing line segments and integrate them into groups
8. remove inappropriate peaks (missing line segments)
9. unfold line segments in each group
10. integrate line segments intergroups
11. postprocessing line segments innergroup
12. adjust selfsimilarity probability of line segments
2. calculate chromagram by chroma filter
3. lag time selfsimilarity matrix (SSM) of chromagram
4. remove noise to highlight repetitive line segments in SSM, denoised SMM
5. extract line segments from denoised SSM
6. integrate repeated sections (line segments) into groups
7. redetect missing line segments and integrate them into groups
8. remove inappropriate peaks (missing line segments)
9. unfold line segments in each group
10. integrate line segments intergroups
11. postprocessing line segments innergroup
12. adjust selfsimilarity probability of line segments
13. select chorus sections
The 2nd step “chromagram calculation” is done by a chroma filter, this is different from Goto’s method. In step 5 and
7, “dynamic threshold” of equation.8 in original article is replaced by the “True Envelope” threshold (step 5) and “low pass
filter envelope” (step 7). Some line segments integration rules have been added in step 10. A post processing procedure has
been added in step 11. In step 12, the chorus measure is not only weighted by average value of line segment length, but also by the length of each line.
4. Some details
4. Some details
Steps 5 and 7 Equation.7 and 10 has found not working to detect peaks of $R_{all}(t,l)$ and $R_{[T_s,T_e]}(l)$. The reason is these
formulas low pass signal and cause diminution of peak amplitude and deviation of peak location. Besides, the dynamic
threshold of equation.8 is not efficient for detecting all the peaks.
Fig. 2 True Envelope of Rall(t,l), and peak picking 
Instead, “True envelop” is proved experimentally as an efficient threshold for the peak picking. Considering the calculation
complexity of “True envelope” method, too much use might slow the algorithm. So in step 7, a low pass filter envelope is
used instead of “True envelope”.
The peaks detected above this threshold (Fig. 2) are those have important amplitudes, which means there might be long repetitive line segments in their corresponding sections. So these important amplitude peaks need to be extracted.
The peaks detected above this threshold (Fig. 2) are those have important amplitudes, which means there might be long repetitive line segments in their corresponding sections. So these important amplitude peaks need to be extracted.
Fig. 3 Rules of integrating line segments intergroups 
Step 10 The rules of integrating line segments intergroups has been defined (Fig. 3):

line segment is overlapping with a generative line segment of another group. Generative line segment is which generate
other line segments inside a group.

a group section includes another group section

two group sections are close (< threshold), but not overlapped. This rule can be only implemented once, and there is
a length limit for these two concatenated line segments.
Step 11 The postprocessing are:

remove line segments which are too short : shorter than 1/3 of original group section

integrate inner line segments which are included by other segments in groups.
Step 13 The equation 16 in article has been changed to:
Because in our implementation, the line segments of the same group may have different lengths, whereas, in Goto’s article,
all the line segments of the same group are equal length.
It is important to mention that the group which has the most great chorus measure might not be the chorus. Because this algorithm detect not only the chorus but in fact the repetitive sections in audio signal. Any repetitive section which has great selfsimilarity probability and which are long enough could have the most great score. If the most great score group has been proved not be the chorus section, we can try the second great score group.
5. Code Matlab
I put the code matlab at github https://github.com/ronggong/lyricsaudio, feel to use and change. Don't forget to leave a comment if you got nice ideas. :D
It is important to mention that the group which has the most great chorus measure might not be the chorus. Because this algorithm detect not only the chorus but in fact the repetitive sections in audio signal. Any repetitive section which has great selfsimilarity probability and which are long enough could have the most great score. If the most great score group has been proved not be the chorus section, we can try the second great score group.
5. Code Matlab
I put the code matlab at github https://github.com/ronggong/lyricsaudio, feel to use and change. Don't forget to leave a comment if you got nice ideas. :D
I can't find Finvlp function in Ftrueenv.m. It occurs an error. Is there any way i can get the code for the function Finvlp?
ReplyDeleteDid you have found the Finvlp function code?
Delete