previous next contents
Character Classification

Character classification is used in regular expression processing.

character classification - the assignment of a character to one or more character classes.

character class - one of the following: upper, lower, digit, space, graph, print, punct, cntrl, xdigit, and alpha. Not to be confused with equivalence class.

POSIX.2 also defines the class blank. All characters in the blank class are automatically included in the space class. If characters are not explicitly assigned to blank, then and will belong to blank. There is no isblank() function.

POSIX.2 specifies whether a character belonging to one class may also belong to other classes. See 2.5.2.1 LC_CTYPE.

The concept of character class finds application in regular expressions and other string manipulation applications.

character classification function - one of either the is*() functions or the isw*() functions.

is*() function - a collection of character classification functions which take as an argument the int representation of an 8bit codepoint. One of isalpha(), isupper(), islower(), isdigit(), isxdigit(), asalnum(), ispunct(), isprint(), isgraph(), isspace(), iscntrl(). Except for isalnum(), there is a one-to-one correspondence between the is*() functions and character classes.

isw*() function - a collection of character classification functions which take as an argument the wchar_t representation of any character of any codeset. One of iswalpha(), iswupper(), iswlower(), iswdigit(), iswxdigit(), iswalnum(), iswpunct(), iswprint(), iswgraph(), iswspace(), iswcntrl().

isalpha() - returns true if character is upper, lower, or alpha.

isupper() - returns true if character is upper.

islower() - returns true if character is lower.

isdigit() - returns true if character is digit.

isxdigit() - returns true if character is xdigit.

isalnum() - returns true if character is upper, lower, digit, or alpha.

ispunct() - returns true if character is punct.

isprint() - returns true if character is upper, lower, alpha, digit, or punct.

isgraph() - returns true if character is upper, lower, alpha, digit, graph, or punct.

isspace() - returns true if character is space.

iscntrl() - returns true if character is cntrl.

character classification in 7bit ASCII - Each character in 7bit ASCII is assigned to one or more character classes in the following way (escaped numbers are in decimal): \0 - \8 control; \9 - \13 control and space; \14 - \31 control; \32 space and blank; \33 - \47 punctuation; '0' - '9' numeric; 'A' - 'F' uppercase and hexadecimal; 'G' - 'Z' uppercase; \59 - \65 punctuation; 'a' - 'f' lowercase and hexadecimal; 'g' - 'z' lowercase; \123 - \126 punctuation; \127 control.


previous next contents