One of the nice things about chip cards (ICCs) is that the data that comes out of them is virtually always supplied in a standard format, called BER-TLV. In plain English: Basic Encoding Rules, Tag-Length-Value (a quaint but informative article about it can be found here).
The BER-TLV format is one of the ASN.1 (Abstract Syntax Notation) encodings defined by ITU X.690, which is a very old set of standards dating to the primordial predawn of the Internet.
Chip cards use the TLV scheme to encode card data. At its simplest, the Tag-Length-Value scheme just means that if you have a tag called (say) "5A" and its value is 8 octets represented by (for example) successive hex values "41 11 12 34 56 78 9A BC," then the TLV encoding will look like 5A084111123456789ABC, where 5A is the tag, 08 is the length, and 4111123456789ABC is the value.
EMVCo (the card-issuer consortium behind the whole chip-card thing) defines a bunch of standard tags for chip-card transactions. For example 5A always encodes the PAN (primary account number, or card number), 9F02 encodes the Authorized Amount of a transaction, 5F2D encodes Language Preference, and so on. The complete list of EMVco-defined tags (and their meanings) can be found at https://www.eftlab.co.uk/index.php/site-map/knowledge-base/145-emv-nfc-tags.
Given that TLVs encode their own length, it should be a snap to parse TLV data, right?
Well, yes. Mostly. Kind of.
If every tag had a simple one-byte identifier (like 5A), it really would be super-duper-easy to parse a TLV stream. But the TLV scheme wouldn't be very useful if identifiers could only ever take on one of just 256 possible values.
To make tag identifiers extensible, Basic Encoding Rules allow for the possibility of multi-byte tags. The rules say that if the bottom 5 bits of the first tag byte are set, then more tag-identifier bytes follow. In subsequent bytes, the top bit is set if more bytes follow, whereas the top bit is zero in the final byte. So for example, 5F24 is a legal 2-byte tag identifier, DFEF01 is a legal 3-byte tag, and so on.
EMVCo (which incorporates BER-TLV by reference in Book 3, Annex B, of the EMV specifications) also allows for the concept of "wrapper" tags, to enable hierarchical parent-child relationships (or nesting) among TLVs. Under EMV rules, if the sixth bit of a tag's first byte is set, the tag is said to be "constructed" (I prefer the term compound). Thus, a 3-byte tag FFEE01 could be used to wrap (fictional) TLVs of 3F0188 and 3F025544 as follows: FFEE01073F01883F025544. The parent tag, FFEE01, has 7 bytes of data, consisting of a 3-byte TLV and a 4-byte TLV. Groups of tags can be nested to any desired depth using this scheme.
Note carefully, the Length byte of a TLV can also be multi-byte. Here, the extensibility rule (taken from EMV Book 3 Annex B2) is:
A length byte with the top bit set will mean you have to treat the bottom 7 bits as the "length of the Length." In other words, a Length byte of 0x82 means that there are two bytes of Length info (in the two bytes that follow). In the (fictional) TLV represented by 5F0F8103AABBCC, the tag is 5F0F, the length of the Length is one byte, the actual Length is 3 bytes, and the Value is AABBCC.
Clear as mud, right?
The tactic we use here is brain-dead simple:
First, make available a big dictionary of tag identifiers, containing all known EMVCo (industry standard) tags, plus all known ID TECH proprietary tags. We call this dictionary _KnownTags, and you can test an identifier like '5A' for existence by seeing if
_KnownTags[ '5A' ] returns true.
Our parsing algorithm is super simple:
Read two nibbles at a time into a
tag variable, and test whether the tag exists in the dictionary. All tags in the dictionary will be one, two, or three bytes long, so if we read 6 nibbles without finding a known tag, just advance the reading frame by 2 bytes and continue on like nothing happened (after emitting a console message saying "Expected a tag, found none"). If you want to be fussy and throw an exception here, you can, but my philosophy is that (depending, of course, on the circumstances) a parser should by default be fail-soft (fault tolerant), in case you still want to use the rest of the parsed data.
Once a tag is found, use a worker method, in this case an inner function called readData(), to read past the tag, read the Length, and use the Length to read the Value. (Here, we need to be careful to check the top bit of the presumed Length, to see whether we need to follow the length-of-the-Length extensibility hack rule mentioned earlier.)
Put the Value into a storage object under a lookup key of
At the end, return the storage object.
So let's try a real-world example. Suppose you've got an ID TECH Augusta chip-card reader, and you're using it in keyboard mode to capture Quick Chip data. The data that streams out of the device when you dip a card might look like:
This is a big block of TLV data that begins with an ID TECH proprietary tag of DFEE25. (You can learn more about what ID TECH's tags mean by downloading the ID TECH TLV Tag Reference Guide from https://atlassian.idtechproducts.com/confluence/display/KB/Downloads+-+Home.) Most of the tags in this block, however, are industry-standard EMVCo tags. If we assign the block, as a string, to a JS variable called
tagblock, and then load the above parser and run it with
parseTags( tagblock ), we'll get back an object with tags and values, like this:
Some of these tags are empty. Some (like 9F27) contain a Value of 00. Some are encrypted. But basically, you have all the tags you need, right here, to run an EMV transaction.