QR Code Demystified – Part 3

Tuesday, June 7th, 2011

QR Code Demystified Part 3

Now we’ll cover how the data is encoded. There are several steps involved. First, the encoding method is chosen, then the raw data is converted to binary based on the encoding method, then the error correction algorithm is applied, and then the data is placed in the symbol. Finally a mask is selected and applied.

For now I’ll cover the encoding methods, and conversion to binary, and save the rest for later.

Before I go any further, I want to point out that an understanding of dealing with the binary number system will be needed. Because it’s very easy to get an explanation of it, and many programmers are already familiar with it, I’m going to assume this understanding. If you need an explanation or refresher course, try Wikipedia or a Google search. Further, I’m going to refer to adding bits to your data. Think of this as a stream of bits that will later be placed in the correct order on the symbol. I’ll explain exactly how that’s done later, but for now, just imagine that you’re creating a chain of 1s and 0s. I might add spaces between some of the bits to make them more readable, but they’re not part of the binary data.

The encoding methods are Numeric, Alphanumeric, Binary, and Kanji. Numeric only supports the digits 0-9, but can store 3 of them in only 10 bits. Alphanumeric supports letters A-Z (upper-case only), digits 0-9, and the special characters $%*+-./: and space. It’s good for encoding URLs and simple text. It takes 11 bits to store 2 alphanumeric characters. Binary data is stored 8 bits per character, and supports the 256 characters in the extended ASCII table. Kanji takes 11 bits for a single character. I won’t go into detail on the Kanji, because I’m guessing that very few people reading this tutorial will need to encode Japanese. So if you do, you’ll have to settle for Romaji or find the details elsewhere.

In order to encode data with one of these methods, we first indicate which method we’re using, and how much data we’re storing. We indicate the method with four bits – Numeric is 0001, Alphanumeric is 0010, Binary is 0100, and Kanji is 1000. The encoding method determines how many bits we use to indicate the data length. Details are in Table 1 below. As an example, if we’re encoding 5 binary characters to a Version 1 symbol, the binary data we start with will be 0100 00000101. (0100 to indicate binary, and 00000101 is the 8-bit representation of 5, to indicate the data length.) It is also possible to use different methods, by appending a new method/size indicator after the previous data, followed by the new data. (In other words, Method1, Size1, Data1, Method2, Size2, Data2)

Table 1 – Bits used to indicate data length

Version 1-9 Version 10-26 Version 27-40
Binary 8 bits 16 bits 16 bits
Alphanumeric 9 bits 11 bits 13 bits
Numeric 10 bits 12 bits 14 bits

Once we’ve indicated what we’re storing, now it’s time to actually add the data. Binary is the easiest – simply take the 8-bit representation of the character and add it to your data. For the numeric data, you take sets of three digits. For each set of three you encode them directly to their 10-bit binary representations, so you encode “123456” as one-hundred twenty-three followed by four-hundred fifty-six. (0001111011 0111001000) If at the end of your data you have 1 digit left, encode it to four bits, and if you have 2 digits left, encode it into seven bits. Alphanumeric data is a little trickier. You have to convert each character to its numerical value – see Table 2. Then you take pairs of characters, multiply the first numerical value by 45, and add the second numerical value. Then convert the pair to 11-bit binary. If you end up with one character left over at the end of your data, encode its value to 6-bits. For example, ABC would be (A=10*45=450) + (B=11) = 461 and C=12. 461 in 11-bit binary is 00111001101 and 12 in 6-bit is 001100, so 00111001101001100.

Table 2 – Character values in alphanumeric mode

0 1 2 3 4 5 6 7 8 9 A B C D E
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
F G H I J K L M N O P Q R S T
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
U V W X Y Z (sp) $ % * + . / :
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

After you’ve encoded all your data, if there’s any space left over, you need to add some padding. First, add 0000. Then, if the number of bits in your data isn’t divisible by 8, add 0s until it is. Then alternately add “11101100” and “00010001” until you’ve reached the limit for your version and error correction mode (Table 3). If you’ve reached your limit just from your actual data, none of this is necessary.

Table 3 – Maximum bits for data

1 2 3 4 5 6 7 8 9 10
L 152 272 440 640 864 1088 1248 1552 1856 2192
M 128 224 352 512 688 864 992 1232 1456 1728
Q 104 176 272 384 496 608 704 880 1056 1232
H 72 128 208 288 368 480 528 688 800 976
11 12 13 14 15 16 17 18 19 20
L 2592 2960 3424 3688 4184 4712 5176 5768 6360 6888
M 2032 2320 2672 2920 3320 3624 4056 4504 5016 5352
Q 1440 1648 1952 2088 2360 2600 2936 3176 3560 3880
H 1120 1264 1440 1576 1784 2024 2264 2504 2728 3080
21 22 23 24 25 26 27 28 29 30
L 7456 8048 8752 9392 10208 10960 11744 12248 13048 13880
M 5712 6256 6880 7312 8000 8496 9024 9544 10136 10984
Q 4096 4544 4912 5312 5744 6032 6464 6968 7288 7880
H 3248 3536 3712 4112 4304 4768 5024 5288 5608 5960
31 32 33 34 35 36 37 38 39 40
L 14744 15640 16568 17528 18448 19472 20528 21616 22496 23648
M 11640 12328 13048 13800 14496 15312 15936 16816 17728 18672
Q 8264 8920 9368 9848 10288 10832 11408 12016 12656 13328
H 6344 6760 7208 7688 7888 8432 8768 9136 9776 10208

The error correction is quite complicated, so we’ll cover that next time. For now, I’ll say that QR Codes use the Reed-Solomon error correction algorithm. You can read about it at Wikipedia. After that, I’ll cover the placement order of the modules and the masking system. If you haven’t already, you’ll definitely want to check out the previous parts of this tutorial – Part 1 and Part 2.

About Matcha Design

Matcha Design is a full-service creative B2B agency with decades of experience executing its client’s visions. The award-winning company specializes in web design, logo design, branding, marketing campaign, print, UX/UI, video production, commercial photography, advertising, and more. Matcha Design upholds the highest personal standards for excellence and can see things from a unique perspective due to its multicultural background.  The company consistently delivers custom, high-quality, innovative solutions to its clients using technical savvy and endless creativity. For more information, visit MatchaDesign.com.

Related Tags

You Might Also Like