Timeline
Week Of:8/31/13
__________________________________________ 9/7/13 _________________________________________ 9/14/13 _________________________________________ 9/21/13 _________________________________________ 9/28/13 _________________________________________ 10/5/13 _________________________________________ 10/12/13 _______________________________________________ 10/19/13 ___________________________________________________ 10/26/13 _____________________________________________________ 11/2/13 ______________________________________________________ 11/9/13 ______________________________________________________ 11/16/13 ______________________________________________________ 11/23/13 ______________________________________________________ 11/30/13 ______________________________________________________ 12/07/13 |
Progress:We decided that we wanted to work on a musical project due to our mutual interests in playing and listening to music, and our desire to applying engineering knowledge to music. By coincidence, a visiting professor from Brasil named Flávio Ávila who specialized in audio processing was at Wash U for another month. After a discussion with him, we got some interesting papers to read for background, as well as the idea that we would be working with non-negative matrix factorization (See details below).
______________________________________________________ During online research, we found some code from Dan Ellis, Associate Professor at Columbia University. Using several functions of his own creation, he wrote a program that created a spectrogram out of a sound file and also used an inverse spectrogram function to convert the spectrogram back into a sound file. The inverse spectrogram function (called ispecgram), was adapted for our project. We will use ispecgram as a sanity check so we can see if the output signal is the same as the original. Background reading was also performed. (See Bibliography) _________________________________________ The mission of our project was still a little bit vague, so we discussed further what the goals of the project would be and codified the general approach (See About). We figured out what the inputs of the specgram function were and how to get the appropriate signal back (inputs in the following order: signal,number of Fourier transforms, sampling frequency, window vector, percent overlap). While the graphs still weren't printing out with the correct scale on the axes, the shape of the graphs output by the spectrogram function seemed correct. In the song "Weather Bird" by Louis Armstrong and Earl Hines, the amplitude of the frequency declined significantly at 50 seconds, where Earl Hines takes a piano solo. The loudness of the song drops significantly around this point in the song. We took this as early evidence that this method of "specgramming" and "ispecgramming" had at least some functionality for our purposes. _________________________________________ This week we learned how to reset the time axis on the specgram plot. The time axis was expected to go from 1 to 140 seconds for the "Weather Bird" song, but the time scale went past 10^6 "seconds". We needed to divide the time scale by 44,100 (the sampling frequency) to get the axis on the correct scale. This required us to essentially rescale the axes after the spectrogram was calculated. Although the time axis has been fixed, the frequencies are still normalized, so it is difficult to determine which exact frequencies have high amplitudes. _________________________________________ We spent countless hours trying to rescale the frequency axis to no avail. We hope to get past this stumbling block soon. We also experimented with nnmf (non-negative matrix factorization). At first, the nnmf wasn't working, as the nnmf only takes a real, positive matrix and our spectrogram matrix was complex. After taking the absolute value of the specrogram and ispecgramming, we found that the signal was garbled beyond recognition. Our main goal became finding a way to keep the accuracy of the spectrogram while also being able to take the nnmf in order to separate the sound signals. _________________________________________ We confirmed that spectrogram.m was the function replacing specgram with Matlab going forward, and that it also had more flexibility because of the larger number of inputs and superior documentation. A switch from specgram to spectrogram with the same inputs produced terrible results, and we quickly discovered that additional knowledge about the inputs to both functions was needed (Previously we had been relying on a number of the default settings). Specgram's inputs look like this: B = Specgram(A,nfft,fs,window,noverlap), whereas spectrogram's inputs look like this: S=spectrogram(x,window,noverlap,nfft,fs). Spectrogram's description also says that to "obtain the same results for the removed specgram function, specify a 'Hann' window of length 256." After experimenting with different inputs and constructing a Hanning window, it was easy to determine the inputs to the functions and determined ad hoc inputs needed to get a response that we liked. These inputs are the wavread data (d), hanning(2^8+1000), length(hanning(2^8+1000))/2, and 2^8+1000. There were still inputs that needed documenting. In order to gain more facility with the spectrogram function and determine the optimal inputs for the best song, we decided to continue to fine-tune parameters and keep notes. _________________________________________ At this point, we realized we needed to know a little bit more about what a spectrogram really is and what it does. It is defined as a representation of amplitudes at different frequencies and different times, but we need to know how that was reflected in the matrix. We started making whatever observations we could about the incoming signal and the output matrix:
What is d (the original signal outputted from the wavread function) and how do we glean info from it? What is Sry, exactly? Why is it complex? How do we make it not complex so that we can run nnmf to separate the source? ______________________________________________________ After additional research, we figured out some more about the data a spectrogram contains. A spectrogram is a bunch of values that cast light on the Fourier weights of a number of time series. As a simple example, a spectrogram matrix of A = [1 2 1 2 0 0 0 0; 0 0 1 0 2 3 3 3] would suggest that at the beginning of a song, notes are in the low end of the frequency spectrum, while the second part of the song features high frequencies. The number of columns (8 here) indicates how many samples we have taken to reconstruct the soundwave, while the number of rows (2 here) indicates the number of frequencies involved on that particular fourier analysis.
NMF research: Continuing the quest to move from specgram to spectrogram, we noticed that leaving s the same introduces distortion, a weird graph, and a sped-up song. When we tried to use a window of length 256, as spectrogram had suggested, the song was sped up even more. Furthermore, the graphed spectrogram was showing frequency on the x axis and time on the y axis instead of the other way around as specgram had done. Transposing the spectrogram did not put the time back on the x axis, nor did it make the inverse specgram regenerate a true sound (it made it work much worse). After several more hours of making sure all of the inputs were the same, and ensuring that calls to both functions used all 5 inputs to avoid different default settings, spectrogram seemed to be working. We still wanted to fix the plot of the spectrogram in the hopes that cleaning that up would help analysis of the spectrogram matrix. Detour: By changing the length of the hanning window in the input to the spectrogram function and leaving the ispecgram inputs unaltered, we have discovered how to slow down a song without changing its frequency. This is an exciting finding; it wasn't until 1978 that engineers discovered how to do this. With a simple Matlab function, we have discovered how to slow down and speed up a song without changing the frequency. The reason this works is because the ispecgram function assumes a certain-length Hanning window (call the length L). When the Hanning window length is made longer, the ispecgram will make the song slower, as it assumes that the Hanning window is the same length. The opposite is also true. (I THINK A PICTURE HERE WOULD HELP) This discovery is directly applicable to our final application. Now, if we can split a song into multiple tracks, we can also slow down that song (or speed it up). This would allow a musician to more quickly learn a song, as she can slow down the song to more accurately process it. If a musician wants to play a melody faster, she has that option as well. ______________________________________________________ Reminding ourselves that frequencies are in the rows and times are in the columns of the spectrogram, i.e different rows mean different frequencies, we figured out a few more details about speeding up and slow down the sound files without distorting the frequencies. We figured out how to choose the play speed (in .25 increments from .25 to 2, and in .5 increments from 2 to 3.) Moving back to work with non-negative matrix factorization, we wanted to try running nnmf on each column, i.e each timestep. We planned to put in the spectrogram as the input to the nnmf function in Matlab and the number 2 for "k." K has to be 2 when we have 2 instruments as in Weatherbird. We also wanted to try taking the nnmf of whole thing, possibly transposed so that it was the same direction as when we ran it on time columns. ______________________________________________________ This week we decided to take a quick detour from our spectrogram/nnmf work and lay the foundation for a future line of investigation. We went to the recording studio on campus and recorded 13 notes (from the G below middle C to the G above middle C) on each of 6 instruments: acoustic guitar, electric guitar, harmonica, saxophone, ukulele, and clarinet. These were intended to be used as a library of sounds to use in a future statistical approach to our problem. Back in the lab, we took the absolute value of the spectrogram and turned it back into a .wav to attempt to isolate the part of the process that wasn't working. The song was definitely audible, but it was surrounded by heavy distortion. This was good enough to hear the trumpet notes, but it downed out the quieter piano. This meant that the garbling was happening due to the absolute value, not the nnmf. ______________________________________________________ After the disappointment of last week, we hypothesized about taking a 15 instrument deconstruction, i.e k=15 in the nnmf, and multiplying them back together to see if that produced a more true sound file. Another thought was to take 15 individual ispecrams, hoping that each would contain only piano or only guitar. Running nnmf with k =12, multiplying the 2 matrices, and turning that back into a .wav, created the same level of distortion as running ispecgram with no nnmf involved. Thus we concluded that nnmf is working well, but the absolute value is hurting the signal. Therefore we needed a way to fix the imaginary part that leaves more information in the signal. One idea we had was to take the wave file, use spectrogram.m, pass the file into a new function called aspecgram.m which turns complex numbers into products of sines and cosines, then go through nnmf, pass into "iaspecgram.m" which does the reverse of "aspecgram.m", and finally reverse the spectrogram into a .wav again. Some mathematical calculations led to this breakdown of the signals in terms of sins and cosines: (a+bi)*e^(i*theta) = a*cos(theta) + a*i*sin(theta) + b*i*cos(theta) - b*sin(theta) Our other method of attack was to figure out how to get rid of the absolute value another way - determine the form of each entry of a spectrogram by going through the code for spectrogram line by line. ______________________________________________________ Discussions with Ed Richter and Jason Troubaugh left us with several avenues to go down next. A middle C on any instrument will have a Fourier transform with a peak at the same frequency. However, there are probably features of that graph that differ between instruments. Once that difference is found, we can find the difference in the spectrogram matrix that accounts for this difference. Then, we may be able to use a numerical cutoff to cut out sounds from different instruments. While this only will obtain the fundamental frequencies of the instruments, it's a good start on the road to separating instruments. The question du joir is: if we can identify a note first, can we then see what instrument played it? We graphed a single row/column of the spectrogram. We did this on a simple case: a 440 Hz sound with a 250 Hz sound. This is an uncomplicated example, but the approach of finding the fundamental frequency and going from there seems promising. ______________________________________________________ We made great strides in our project this week. For one, we have figured out a way to "filter" particular sound waves. First, we find the main peak of a sound wave by the peak function in matlab. Then we use the other peaks by the criteria "if the signal is more than 10% of the biggest peak, then consider it a secondary peak." Then, we recreated the sound file by taking out all rows of the spectrogram that do not correspond to any of the peaks found. Then, we are left with the pith of the signal. After trying this on simple signals, we found that this rudimentary approach is useful. ______________________________________________________ Now, we want to automatically extract important signals. To do this, we look through the signal of interest and take the rows and surrounding rows and set the other rows to zero. This gives us the pith of the signal, as described above. However, we want to play the important components of the signal one at a time if we desire. So we separated the peaks into separate spectrograms. Then, we found that we can separate the signal by choosing which spectrogram to ispecgram. Although the signals we are separating are pure waves and harmonics of those pure sines, but we think this might be applicable to other domains. ____________________________________________________ Huzzah! We can now separate a complex chord into its individual components! We decomposed a E7 sharp 9 guitar chord into its 5 notes: E, G#, B, D, and G. This is very promising. See demo. Separating a guitar note and a piano note, we noticed that we lose all of the aspects that make the guitar note sound like a guitar. The piano still sounds like a piano because it was a purer frequency to begin with. Next step is to take a song of maybe 5 seconds, and run this algorithm on each time step. This can't be automatic yet as we don't have a way to automatically pick the correct peaks. We can also theoretically play a full song and count how many times a certain note is played. |