UCD: Derived Character Properties

This document describes a number of data files in the Unicode Character database. These are the Derived data files, containing information that can be completely derived from other data files, but is presented in a different format for ease of use.

The files themselves are informative, although they may contain normative properties. For more information, see UnicodeCharacterDatabase.html.

Derived Core Properties

The following are important derived properties of Unicode characters, and are contained in DerivedCoreProperties.txt.

Property Name	N/I	Definition and Generation
Math	I	Characters with the Math property. For more information, see Chapter 4, Character Properties. Generated from: Sm + Other_Math
Alphabetic	I	Characters with the Alphabetic property. For more information, see Chapter 4, Character Properties. Generated from: Lu+Ll+Lt+Lm+Lo+ Other_Alphabetic
Lowercase	I	Characters with the Lowercase property. For more information, see Chapter 4, Character Properties and UAX #21: Case Mappings. Generated from: Ll + Other_Lowercase
Uppercase	I	Characters with the Uppercase property. For more information, see Chapter 4, Character Properties and UAX #21: Case Mappings. Generated from: Lu + Other_Uppercase
ID_Start	I	Characters that can start an identifier. Generated from Lu+Ll+Lt+Lm+Lo+Nl
ID_Continue	I	Characters that can continue an identifier. See Cf Note. Generated from: ID_Start + Mn+Mc+Nd+Pc
XID_Start	I	Same as ID_Start, except for modifications to allow closure under normalization forms NFKC and NFKD. Generated from: ID_Start; see Closure Note
XID_Continue	I	Same as ID_Continue, except for modifications to allow closure under normalization forms NFKC and NFKD. Generated from: ID_Continue; see Closure Note and Cf Note.
Default_Ignorable_Code_Point	N	For programmatic determination of default-ignorable code points. New characters that should be ignored in processing (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle the default behavior of such characters when not otherwise supported. For more information, see UTR #29: Text Boundaries (in proposed draft status at release time for Unicode 3.2). Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs - White_Space
Grapheme_Base		For programmatic determination of grapheme cluster boundaries. For more information, see UTR #29: Text Boundaries (in proposed draft status at publication of Unicode 3.2). Generated from: [0..10FFFF] - Cc - Cf - Cs - Co - Cn - Zl - Zp - Grapheme_Extend - Grapheme_Link - CGJ CGJ = Combining Grapheme Joiner
Grapheme_Extend		For programmatic determination of grapheme cluster boundaries. For more information, see UTR #29: Text Boundaries (in proposed draft status at publication of Unicode 3.2). Generated from: Me + Mn + Mc + Other_Grapheme_Extend - Grapheme_Link - CGJ

Derived Extracted Properties

The following files contain other properties of the UCD that are simply separated out, and listed in range format. These files are provided purely as a reformatting of existing data, with a certain exceptions listed below. They are all contained in a subdirectory called extracted.

Derived Normalization Properties

The properties in DerivedNormalizationProperties.txt are useful in dealing with normalization forms. In the following table, NF* refers to one of NFD, NFC, NFKC, or NFKD.

Revision	3.2.0
Authors	Mark Davis
Date	2002-03-22
This Version	http://www.unicode.org/Public/3.2-Update/DerivedProperties-3.2.0.html
Previous Version	http://www.unicode.org/Public/3.1-Update/DerivedProperties-3.1.0.html
Latest Version	http://www.unicode.org/Public/UNIDATA/DerivedProperties.html

Derived Character Properties

Summary

Status

Contents

Introduction

Derived Core Properties

Derived Extracted Properties

Derived Normalization Properties

UCD Terms of Use

Disclaimer

Limitations on Rights to Redistribute This Data