Unicode is an encoding standard universal character. It defines how individual characters are represented in text files, web pages, and other document types.
Unlike ASCII, which was designed to represent only basic English characters, Unicode has been designed to support characters of all languages of the world. The standard ASCII character set supports 128 characters, while Unicode can support approximately one million characters. Although ASCII uses one byte to represent each character, Unicode supports up to 4 bytes for each character.
There are several types of Unicode encodings, but UTF-8 and UTF-16 are the most common. UTF-8 has become the standard character encoding used on the Web and is also the default encoding used by many software. While UTF-8 supports up to four bytes per character, it would be inefficient to use four bytes to represent frequently used characters. Therefore, UTF-8 uses one byte to represent the common English characters. European (Latin), Hebrew, and Arabic characters are represented by two bytes, while three bytes are used for Chinese, Japanese, Korean, and other Asian characters. Additional Unicode characters can be represented with four bytes.