The convention I volunteer at has for many years used barcodes scanners to scan driver's license data to speed the import for at-con registration forms. This greatly speeds up entry of member data (name and address). We're using a new system this year and looking to get new scanners. The old scanners we had were a higher end model that decoded the data in the PDF 417 and let you configure the fields and order the scanner would return, so you could basically put your cursor at the top of a form, scan the barcode and the form would get filled in just like it was being typed from a keyboard.
Looking to save some money, I found some much lower cost scanners (in the $50 range) on Amazon.
Unboxing the scanner, it definatly is no frills. In the box you get the scanner and a 1-sheet Quick Refernce Guide with Chinese as the first language.
Plugging it into my Mac, it showed up as a keyboard just fine. So I fired up TextEdit and did a scan. What I got back was a whole bunch of gibberish, but I could at least make out some information like my name and address inside there. There was also a lot of beeping as some unknown characters were attempted to get typed into TextEdit. So the scanner did work, but the results were a little more raw then what I was expecting having only used higher end scanners.
The result of what I came up with is posted to github here: https://github.com/ketaro/LicenseParser
After the first segement terminator, the stack should contain all the data in the PDF417 header. This is then parsed out and validated. The header must start with "ANSI ". The header also shows version information and how many segments are in the barcode (called "the file" in the spec). This gives us an indication so we know when we've reached the end of the data.
After the header, data is received. Data is prefixed with a three-letter code (Element ID) that indicates what the field contains. In the library, there's a lookup hash object called license_fields that contains the Element ID's I was interested in and what field I wanted to map that data to in my normalized results. There are loads more ID's out there then what I'm capturing in the example library. There's also a fair amount of overlap where multiple codes could represent the same data and that is refelected in the lookup hash.