audio - Compare sound between source and microphone in JavaScript -
i'm working audio i'm newbie in area. matching sound microphone source audio(just 1 sound) coke ads shazam. example video (0.45 minute) however, want make on website javascript. thank you.
building similar backend of shazam not easy task. need to:
- acquire audio user's microphone (easy)
- compare source , identify match (hmm... how do... )
how can perform each step?
aquire audio
this 1 definite no biggy. can use web audio api this. can google around tutorials on how use it. this link provides fundametal knowledge may want understand when using it.
compare samples audio source file
clearly piece going algorithmic challenge in project this. there various ways approach part, , not enough time describe them here, 1 feasible technique (which happens shazam uses), , described in greater detail here, create , compare against sort of fingerprint smaller pieces of source material, can generate using fft analysis.
this works follows:
- look @ small sections of sample no more few seconds long (note done using sliding window, not discrete partitioning) @ time
- calculate fourier transform of audio selection. decomposes our selection many signals of different frequencies. can analyze frequency domain of our sample draw useful conclusions hearing.
- create fingerprint selection identifying critical values in fft, such peak frequencies or magnitudes
- if want able match multiple samples shazam does, should maintain dictionary of fingerprints, since need match 1 source material, can maintain them in list. since keys going array of numerical values, propose possible data structure query dataset k-d tree. don't think shazam uses one, more think it, closer system seems n-dimensional nearest neighbor search, if can keep amount of critical points consistent. though, keep simple, use list.
now have database of fingerprints primed , ready use. need compare them against our microphone input now.
- sample our microphone input in small segments sliding window, same way did our sources.
- for each segment, calculate fingerprint, , see if matches close storage. can partial match here , there lots of tweaks , optimizations try.
- this going noisy , inaccurate signal don't expect every segment match. if lots of them getting match (you have figure out lots means experimentally), assume have one. if there relatively few matches, figure don't.
conclusions
this not going super easy project well. amount of tuning , optimization required prove challenge. microphones inaccurate, , environments have other sounds, , of mess results, it's not bad sounds. mean, system outside seems unapproachably complex, , broke down relatively simple steps.
also final note, mention javascript several times in post, , may notice mentioned 0 times until in answer, , that's because language of implementation not important factor. system complex enough hardest pieces puzzle going ones solve on paper, don't need think in terms of "how can x in y", figure out algorithm x, , y should come naturally.
Comments
Post a Comment