Previous: Unicode, Up: Characters [Contents][Index]
MIT/GNU Scheme’s character-set abstraction is used to represent groups of characters, such as the letters or digits. A character set may contain any “bitless” character. Alternatively, a character set can be treated as a set of code points.
Returns #t if object is a character set, otherwise it
returns #f.
Returns #t if char is in char-set, otherwise it
returns #f.
Returns #t if code-point is in char-set, otherwise
it returns #f.
Returns a procedure of one argument that returns #t if its
argument is a character in char-set, otherwise it returns
#f.
Calls predicate once on each Unicode code point, and returns a character set containing exactly the code points for which predicate returns a true value.
The next procedures represent a character set as a code-point
list, which is a list of code-point range elements.  A
code-point range is either a Unicode code point, or a pair
(start . end) that specifies a contiguous range of
code points.  Both start and end must be exact nonnegative
integers less than or equal to #x110000, and start must
be less than or equal to end.  The range specifies all of the
code points greater than or equal to start and strictly less
than end.
Returns a new character set consisting of the characters specified by
elements.  The procedure char-set takes these elements as
multiple arguments, while char-set* takes them as a single
list-valued argument; in all other respects these procedures are
identical.
An element can take several forms, each of which specifies one or more characters to include in the resulting character set: a (bitless) character includes itself; a string includes all of the characters it contains; a character set includes its members; or a code-point range includes the corresponding characters.
In addition, an element may be a symbol from the following table, which represents the characters as shown:
| Name | Unicode character specification | 
|---|---|
| alphabetic | Alphabetic = True | 
| alphanumeric | Alphabetic = True | Numeric_Type = Decimal | 
| cased | Cased = True | 
| lower-case | Lowercase = True | 
| numeric | Numeric_Type = Decimal | 
| unicode | General_Category != (Cs | Cn) | 
| upper-case | Uppercase = True | 
| whitespace | White_Space = True | 
Returns a code-point list specifying the contents of char-set. The returned list consists of numerically sorted, disjoint, and non-abutting code-point ranges.
Returns #t if char-set-1 and char-set-2 contain
exactly the same characters, otherwise it returns #f.
Returns a character set that’s the inverse of char-set. That is, the returned character set contains exactly those characters that aren’t in char-set.
These procedures compute the respective set union, set intersection, and set difference of their arguments.
These procedures correspond to char-set-union and
char-set-intersection but take a single argument that’s a list
of character sets rather than multiple character-set arguments.
These constants are the character sets corresponding to
char-alphabetic?, char-numeric?,
char-whitespace?, char-upper-case?,
char-lower-case?, and char-alphanumeric? respectively.
Returns #t if char-set contains only 8-bit code points
(i.e.. ISO 8859-1 characters), otherwise it returns
#f.
Previous: Unicode, Up: Characters [Contents][Index]