UTF-8 Barcode Unicode Character Encoding

Barcode Information | Tutorials | Examples

UTF-8 Barcode Unicode Character Encoding

UTF-8 is a variable-length method of encoding Unicode characters such as Chinese, Japanese, Kanji, Arabic, Russian, or Thai characters for example. Any character in the Unicode standard can be encoded in UTF-8. The first 128 characters (US-ASCII) use only one byte and do not require conversion. To properly encode characters above U+007F two or more bytes are necessary. To encode these characters in 2D barcodes such as PDF417, Data Matrix, and QR Code, the data must first be converted to a string of bytes in little-endian mode without the byte order mark (BOM). In addition, the decoder must be able to properly decode the data. This conversion should take place before encoding the bytes into the barcode. If it is possible to encode ASCII characters instead of UTF-8 it is recommended.

IDAutomation offers a built-in UTF-8 conversion to-byte method for the encoding of Unicode characters above U+007F in 2D barcodes such as PDF417, Data Matrix, and QR Code. Any UTF-8 character in the Unicode range (0-65535) can be encoded using this method.

This built-in method of conversion is available for Data Matrix & QR-Code (Refer to the bottom of the page for PDF417) in the 2021 or later versions of the following products:

UTF8 is also supported in the following 2D Font Packages:

IDAutomation currently offers these products by request for all Developer Licenses and above with an active Level 2 Support and Upgrade Subscription. IDAutomation can also provide source code by request of any developer license purchase so this conversion method can be performed outside of the barcode generation component. This built-in method converts the text string into a sequence of bytes (using 1 byte for the range [0-127], 2 bytes for the range [128-2047] and 3 bytes for the range [2047-65535] and arranges the byte sequence into a new string in little-endian mode without BOM. This is the format most scanners and decoders use.

Reading and Decoding UTF-8 in 2D Barcodes

Most USB barcode scanners cannot properly decode barcodes that include UTF-8 or Unicode. The following barcode decoder apps have been tested and are known to properly decode UTF-8:

Recommended Product:

UTF8 Encode and Decode Example:

QR Code Symbol with UTF-8 Encoding.

QR Code Symbol with UTF-8 Encoding

Decode using the IDAutomation Barcode Decoder Verifier App.

Decode using the IDAutomation Barcode Decoder Verifier App

 

Other UTF8 Decoding Products:

  • Cognex Barcode Scanner App & SDK (iOS | Android)
  • BeeTag on iOS by Connvision Ltd. (Does not scan large codes)
  • GDPicture.NET (Latest version only)
  • iOS camera app (for QR Code only)

PDF417 UTF-8 Support

Encoding UTF-8 in PDF417 is not very efficient compared to Data Matrix and QR Code, therefore it is not recommended. However, we have included this functionality in some products. The built-in method of encoding UTF-8 in PDF417 is in the latest version of the following products: