Fiddling with CGI to really understand how it works, I hit an annoying problem. It is not with CGI but with Elvis.
I am trying to see that the file that is uploaded in POST data to the server is parsed and saved identically in the server. Since I simply use the following code the capture the data, I need to delete the leading part before the actual binary data:
#!/bin/ash echo -ne 'Content-type: text/html\n\n' echo $REQUEST_METHOD echo '<br>' echo $CONTENT_LENGTH echo '<br>' echo $CONTENT_TYPE echo '<br>' head -c $CONTENT_LENGTH > /tmp/x.bin
The leading part is something like:
-----------------------------150317703410861467271174091648 Content-Disposition: form-data; name="myFile"; filename="Infoblatt.eng.pdf" Content-Type: application/pdf
Well, how do I do that? Simply I fire up my vi, switch to hex editing mode by ":disp hex" and start deleting the leading part by pressing `x'. I save the file and compare the stripped file with the original PDF file using `vbindiff'. There I see discrepancies because whenever the original file has `0x0A' (\n), the stripped file has `0x0D 0x0A' (\r\n). I think either CGI is crazy enough to fiddle with binary data or `head -c' doesn't work as it should. After looking at a non-free CGI file upload handler, I am sure that CGI doesn't mangle binary data. So, I implement my own version of `head'.
Looking at x.bin using Elvis, I saw that between `Content-Type: application/pdf' and the binary data, there was two `0x0A'. So, my parser tried to catch double `0x0A' before getting the binary data. That failed. My parser got screwed up outputting the wrong data. The implementation of the parser was downstraight simple. So, I was very frustrated wondering what was wrong with that. So, I fired up my GDB.
There in GDB I saw that actually it was not double `0x0A' but double `0x0D 0x0A'! So, that is Elvis that mangles my data. Before ranting on my blog, I decided to google for "elvis editing hex \r \n", "elvis editing hex transform \r to \n" and finally "elvis writing binary". This page says something about non-binary file that reminds me of Microsoft Windows non-binary file mode. So, I finally googled for "elvis binary mode" and found the answer.
In short, Elvis tries to auto-detect whether or not a file is a binary. If Elvis thinks that a file is non-binary, it will display `\r\n' as `\n' even in ":disp hex" mode and will transform `\n' to `\r\n' upon writing the file. My x.bin was a mix between text data and binary one, and that fools Elvis to think that the file is a binary one (that's why I don't like Plug 'n Play since the saying goes that it is actually Plug 'n Pray). To be safe, always fire Elvis with `-b' as in `vi -b' when editing a file that you want to see as it is.