Now we’ll cover how the data is encoded. There are several steps involved. First, the encoding method is chosen, then the raw data is converted to binary based on the encoding method, then the error correction algorithm is applied, and then the data is placed in the symbol. Finally a mask is selected and applied.
For now I’ll cover the encoding methods, and conversion to binary, and save the rest for later.
Before I go any further, I want to point out that an understanding of dealing with the binary number system will be needed. Because it’s very easy to get an explanation of it, and many programmers are already familiar with it, I’m going to assume this understanding. If you need an explanation or refresher course, try Wikipedia or a Google search. Further, I’m going to refer to adding bits to your data. Think of this as a stream of bits that will later be placed in the correct order on the symbol. I’ll explain exactly how that’s done later, but for now, just imagine that you’re creating a chain of 1s and 0s. I might add spaces between some of the bits to make them more readable, but they’re not part of the binary data.
The encoding methods are Numeric, Alphanumeric, Binary, and Kanji. Numeric only supports the digits 0-9, but can store 3 of them in only 10 bits. Alphanumeric supports letters A-Z (upper-case only), digits 0-9, and the special characters $%*+-./: and space. It’s good for encoding URLs and simple text. It takes 11 bits to store 2 alphanumeric characters. Binary data is stored 8 bits per character, and supports the 256 characters in the extended ASCII table. Kanji takes 11 bits for a single character. I won’t go into detail on the Kanji, because I’m guessing that very few people reading this tutorial will need to encode Japanese. So if you do, you’ll have to settle for Romaji or find the details elsewhere.
In order to encode data with one of these methods, we first indicate which method we’re using, and how much data we’re storing. We indicate the method with four bits – Numeric is 0001, Alphanumeric is 0010, Binary is 0100, and Kanji is 1000. The encoding method determines how many bits we use to indicate the data length. Details are in Table 1 below. As an example, if we’re encoding 5 binary characters to a Version 1 symbol, the binary data we start with will be 0100 00000101. (0100 to indicate binary, and 00000101 is the 8-bit representation of 5, to indicate the data length.) It is also possible to use different methods, by appending a new method/size indicator after the previous data, followed by the new data. (In other words, Method1, Size1, Data1, Method2, Size2, Data2)
Table 1 – Bits used to indicate data length
Version 1-9 | Version 10-26 | Version 27-40 | |
Binary | 8 bits | 16 bits | 16 bits |
Alphanumeric | 9 bits | 11 bits | 13 bits |
Numeric | 10 bits | 12 bits | 14 bits |
Once we’ve indicated what we’re storing, now it’s time to actually add the data. Binary is the easiest – simply take the 8-bit representation of the character and add it to your data. For the numeric data, you take sets of three digits. For each set of three you encode them directly to their 10-bit binary representations, so you encode “123456” as one-hundred twenty-three followed by four-hundred fifty-six. (0001111011 0111001000) If at the end of your data you have 1 digit left, encode it to four bits, and if you have 2 digits left, encode it into seven bits. Alphanumeric data is a little trickier. You have to convert each character to its numerical value – see Table 2. Then you take pairs of characters, multiply the first numerical value by 45, and add the second numerical value. Then convert the pair to 11-bit binary. If you end up with one character left over at the end of your data, encode its value to 6-bits. For example, ABC would be (A=10*45=450) + (B=11) = 461 and C=12. 461 in 11-bit binary is 00111001101 and 12 in 6-bit is 001100, so 00111001101001100.
Table 2 – Character values in alphanumeric mode
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T |
15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 |
U | V | W | X | Y | Z | (sp) | $ | % | * | + | – | . | / | : |
30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 |
After you’ve encoded all your data, if there’s any space left over, you need to add some padding. First, add 0000. Then, if the number of bits in your data isn’t divisible by 8, add 0s until it is. Then alternately add “11101100” and “00010001” until you’ve reached the limit for your version and error correction mode (Table 3). If you’ve reached your limit just from your actual data, none of this is necessary.
Table 3 – Maximum bits for data
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
L | 152 | 272 | 440 | 640 | 864 | 1088 | 1248 | 1552 | 1856 | 2192 |
M | 128 | 224 | 352 | 512 | 688 | 864 | 992 | 1232 | 1456 | 1728 |
Q | 104 | 176 | 272 | 384 | 496 | 608 | 704 | 880 | 1056 | 1232 |
H | 72 | 128 | 208 | 288 | 368 | 480 | 528 | 688 | 800 | 976 |
11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
L | 2592 | 2960 | 3424 | 3688 | 4184 | 4712 | 5176 | 5768 | 6360 | 6888 |
M | 2032 | 2320 | 2672 | 2920 | 3320 | 3624 | 4056 | 4504 | 5016 | 5352 |
Q | 1440 | 1648 | 1952 | 2088 | 2360 | 2600 | 2936 | 3176 | 3560 | 3880 |
H | 1120 | 1264 | 1440 | 1576 | 1784 | 2024 | 2264 | 2504 | 2728 | 3080 |
21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | |
L | 7456 | 8048 | 8752 | 9392 | 10208 | 10960 | 11744 | 12248 | 13048 | 13880 |
M | 5712 | 6256 | 6880 | 7312 | 8000 | 8496 | 9024 | 9544 | 10136 | 10984 |
Q | 4096 | 4544 | 4912 | 5312 | 5744 | 6032 | 6464 | 6968 | 7288 | 7880 |
H | 3248 | 3536 | 3712 | 4112 | 4304 | 4768 | 5024 | 5288 | 5608 | 5960 |
31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | |
L | 14744 | 15640 | 16568 | 17528 | 18448 | 19472 | 20528 | 21616 | 22496 | 23648 |
M | 11640 | 12328 | 13048 | 13800 | 14496 | 15312 | 15936 | 16816 | 17728 | 18672 |
Q | 8264 | 8920 | 9368 | 9848 | 10288 | 10832 | 11408 | 12016 | 12656 | 13328 |
H | 6344 | 6760 | 7208 | 7688 | 7888 | 8432 | 8768 | 9136 | 9776 | 10208 |
The error correction is quite complicated, so we’ll cover that next time. For now, I’ll say that QR Codes use the Reed-Solomon error correction algorithm. You can read about it at Wikipedia. After that, I’ll cover the placement order of the modules and the masking system. If you haven’t already, you’ll definitely want to check out the previous parts of this tutorial – Part 1 and Part 2.