The subject of captcha images has been well covered. There are plenty of available resources for creating those warped letters. I created such a control that is a combination of quite a few different ideas and was quite proud of it until not thirty seconds after demonstrating it, someone asked whether it had a ‘play audio’ button for blind people. All of a sudden my fancy captcha control was not so fancy.
I tried to suggest using reCaptcha as a solution that had all the bells and whistles, but the customisation of my control trumped the somewhat fixed design of reCaptcha. So the problem of the audio captcha brewed a little in the back of my mind and a month or two later I turned back to it to see if I could find a solution.
A speech synthesis engine was not on the cards, so I figured that because the captcha image was a random collection of letters and numbers, the only way I could generate the appropriate audio was to have audio files of all the letters and numbers. I would then need to join them together, on demand, and play them from the web page.
Creating all the letters and numbers is straight forward enough (cue microphone and best-est speaking voice) and playing audio files from a web page has also become pretty easy thanks to the great SoundManager 2 javascript plugin.
The trickiest part was definitely joining mp3 files that SoundManager requires. MP3 audio files are particularly tricky as they can contain ID1 or ID3 tag information, so just joining them back to back would not create a correct MP3 file. What I needed to do was determine if a particular file had the tag information in it and strip it out if necessary. This took a lot of Googling and studying of the MP3 specification but I eventually managed to ensure the files had no ID2/3 tags in them before joining the files together:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | Imports System.IO Public Class MP3Concatenator Public Shared Function Join(ByVal MP3sToJoin As Generic.List(Of String)) As MemoryStream Dim ms As New MemoryStream() Dim bw As New BinaryWriter(ms) 'loop around each file and remove the tags and then concatenate the files For Each mp3File As String In MP3sToJoin Dim bytes() As Byte Dim fs As New FileStream(mp3File, FileMode.Open, FileAccess.Read, FileShare.ReadWrite) Dim br As New BinaryReader(fs) Dim audioStart As Integer = 0 'Check for ID3 Tags fs.Position = 0 If (System.Text.Encoding.ASCII.GetString(br.ReadBytes(3)).ToUpper = "ID3") Then 'position of the header size bytes fs.Position = 6 'MSB of size is Set to 0 and ignored so we need to convert the value audioStart = BitsToLow(br.ReadBytes(9)) 'add the header end position to this audioStart += 9 End If 'Check ID1 Tag fs.Seek(-128, SeekOrigin.End) If (System.Text.Encoding.ASCII.GetString(br.ReadBytes(3)).ToUpper = "TAG") Then 'there is a ID3v1 tag on the end which needs removing fs.Position = audioStart bytes = br.ReadBytes(CInt(fs.Length - 128)) Else fs.Position = audioStart bytes = br.ReadBytes(CInt(fs.Length)) End If bw.Write(bytes) br.Close() fs.Close() Next ms.Position = 0 Return ms End Function Private Shared Function BitsToLow(ByVal Size() As Byte) As Integer Dim Ret As Integer Ret = Size(3) If Size(2) <> 0 Then If CBool(Size(2) And 1) Then Ret += 128 If CBool(Size(2) And 2) Then Ret += 256 If CBool(Size(2) And 4) Then Ret += 512 If CBool(Size(2) And 8) Then Ret += 1024 If CBool(Size(2) And 16) Then Ret += 2048 If CBool(Size(2) And 32) Then Ret += 4096 If CBool(Size(2) And 64) Then Ret += 8192 End If If Size(1) <> 0 Then If CBool(Size(1) And 1) Then Ret += 16384 If CBool(Size(1) And 2) Then Ret += 32768 If CBool(Size(1) And 4) Then Ret += 65536 If CBool(Size(1) And 8) Then Ret += 131072 If CBool(Size(1) And 16) Then Ret += 262144 If CBool(Size(1) And 32) Then Ret += 524288 If CBool(Size(1) And 64) Then Ret += 1048576 End If If Size(0) <> 0 Then If CBool(Size(0) And 1) Then Ret += 2097152 If CBool(Size(0) And 2) Then Ret += 4194304 If CBool(Size(0) And 4) Then Ret += 8388608 If CBool(Size(0) And 8) Then Ret += 16777216 If CBool(Size(0) And 16) Then Ret += 33554432 If CBool(Size(0) And 32) Then Ret += 67108864 If CBool(Size(0) And 64) Then Ret += 134217728 End If BitsToLow = Ret End Function End Class |
The trickiest part was handling the ID3 tag, as the specification states:
The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored..
Which is why I have the BitsToLow function in the code above.
The resulting concatenation is returned as a memory stream because I knew I could output this directly to the Response without writing the concatenated file to disk:
1 2 3 4 5 6 7 8 9 | .... Dim ms As IO.MemoryStream = Nothing ms = MP3Concatenator.Join(MP3FileList) Response.ContentType = "audio/mpeg" Response.ExpiresAbsolute = Date.MinValue If ms IsNot Nothing Then Response.OutputStream.Write(ms.GetBuffer, 0, CInt(ms.Length)) ms.Close() Response.End() .... |
By wiring up the CaptchaAudio.aspx page that returned the concatenated audio to the SoundManager plugin, I could create a link next to the captcha image, that played the letters in the image. Now my captcha control really was fancy.
You could make the project?
Thanks a bunch for this article, it has proven extremely helpful! 🙂 The only part which I struggled with (since I’m a .net newbie) was the actual code to build the generic list for “MP3FileList”. After enough experimentation I got it to work by using logic such as:
Dim MP3FileList As New System.Collections.Generic.List(Of String)()
MP3FileList.Add( server.mappath(“/audio/file1.mp3”) )
MP3FileList.Add( server.mappath(“/audio/file2.mp3”) )
etc
Thanks again!
@theonlykenobi thanks for the comment. Unfortunately I cannot provide the full source for the page without crossing the line my employer for whom I wrote it. I have basically provided all the guts of the process here and there is very little else to do. Given the letters shown on the captcha image (presumably held in session), create a list of the letter/number mp3 files you need to join together and pass it into the concatenator. Output the returned memory stream to the response as shown.
OMG! Awesome article, just stumbled accross it.
Is this the full source of the audio implementation? If not is there any chance of providing it?
Again awesome….
Thanks for sharing 🙂