previous next contents
7 Bit and 8 Bit Codesets

ISO 8859-1 contains graphic characters used in at least 44 countries.

ISO 8859-1 uses only the codes 32-126 (which are identical with US-ASCII) and 160-255. The positions 0-31 and 127-159 are reserved for control characters and normally used in the same way in which they are used with ASCII.

Only two characters in Latin 1 have a special meaning for programs that allow paragraph reformatting, NBSP and SHY.

NBSP - Character NBSP (no-break space) number 160 (0xa0 = ' '+0x80) looks like a normal space and should be used if a line break is to be prevented at this space in the text when it is formated.

SHY - Character SHY (soft hyphen) at position 173 (0xad = '-'+0x80) looks similar to or exactly like the normal hyphen ('-') and should be used when a line break has been established within a word.

Latin 1 characters are becoming increasing important, especially in data communication.

7-bit terminal - terminal in which Latin 1 characters between 160 and 255 are only displayed as the corresponding US-ASCII characters with the highest bit cleared, seeing, e.g. a ')' instead of the copyright symbol.

7-bit terminals will be in use at least for the next ten years.

string table lookup - one approach for a Latin 1 to ASCII conversion is to use a lookup table containing strings representing common replacements for the Latin characters that cannot correctly be displayed by the hardware.

character table lookup - an approach for Latin 1 to ASCII conversion in which each character is mapped to a single character or to a '?' character.

flattening - same as character table lookup.

anding - process by which 8th bit of character is cleared resulting in a 7bit character.

The IBM code page 850 which is supported by MS-DOS and OS/2 contains ALL Latin 1 characters, but at other positions.

One problem with character table lookup is that text layout may be destroyed by multi-character substitutions, especially in tables.

One technique uses the BACKSPACE control character (in the table represented by '@') in order to get new characters by overstriking ASCII characters. This gives very poor results for the capital letters on many printers and is useless on most video terminals, but might be interesting for languages where often only lowercase characters are used accented (e.g. French). The quality of the results depends very much on the type of printer used.


previous next contents