ASCII and Unicode
and other character encodings


1. ASCII

On both Windows/DOS and Unix systems, the 128 most commonly-used characters are each represented by a sequence of 7 bits known as the character’s ASCII code. They are traditionally stored as bytes (8 bits), i.e. the 7-bit ASCII code plus a leading zero. The characters include letters, digits, punctuation marks, and nonprintable control characters such as the backspace, tab, carriage return, etc.   Following are links to more information, including charts listing the 128 characters and numeric values of their ASCII codes.


2. Unicode

Java uses Unicode, in which all the characters are represented by 16 bits (2 bytes). A total of 32,768 different characters are possible in Unicode, thereby allowing it to be a truly international character set. The first 128 Unicode characters are the same as the ASCII characters, but with an extra leading zero byte in front of them. Following are links to information about Unicode.


3. IBM PC Extended character set and IBM PC Scan codes

The IBM PC Extended characters are characters represented by binary codes corresponding to the numbers 128 to 255, whereas the standard ASCII characters are 0 to 127. The IBM Extended character codes are not the same as the Unicode representations of those same characters. The IBM PC Keyboard Scan codes are two-byte combinations, of which the first byte is null (zero). These too are different from the Unicode character codes.


4. ISO Latin and ANSI

The ISO Latin character set includes the ASCII characters, with codes 0 to 127, plus 128 more characters with codes 128 to 255. The ISO Latin character set has become the first 256 characters of the Unicode character set. Yet another character set is the ANSI set, whose first 256 ANSI characters are the same as the ISO Latin characters, which are also the first 256 chararacters of the Unicode set. But the characters with codes higher than 255 are different between ANSI and Unicode.


5. EBCDIC

EBCDIC, used on old IBM mainframes, is another representation of the more common characters such as letters, digits, etc., completely different from ASCII.



Back to course website, main page