Contents
This section of the specification describes the basic data types that may appear as an element's content or an attribute's value.
For introductory information about reading the HTML DTD, please consult the SGML tutorial.
Each attribute definition includes information about the case-sensitivity of its values. The case information is presented with the following keys:
If an attribute value is a list, the keys apply to every value in the list, unless otherwise indicated.
The document type definition specifies the syntax of HTML element content and attribute values using SGML tokens (e.g., PCDATA, CDATA, NAME, ID, etc.). See [ISO8879] for their full definitions. The following is a summary of key information:
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
For some HTML 4.01 attributes with CDATA attribute values, the specification imposes further constraints on the set of legal values for the attribute that may not be expressed by the DTD.
Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence "</" (end-tag open delimiter) is treated as terminating the end of the element's content. In valid documents, this would be the end tag for the element.
A number of attributes ( %Text; in the DTD) take text that is meant to be "human readable". For introductory information about attributes, please consult the tutorial discussion of attributes.
This specification uses the term URI as defined in [URI] (see also [RFC1630]).
Note that URIs include URLs (as defined in [RFC1738] and [RFC1808]).
Relative URIs are resolved to full URIs using a base URI. [RFC1808], section 3, defines the normative algorithm for this process. For more information about base URIs, please consult the section on base URIs in the chapter on links.
URIs are represented in the DTD by the parameter entity %URI;.
URIs in general are case-sensitive. There may be URIs, or parts of URIs, where case doesn't matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URIs are case-sensitive (to be on the safe side).
Please consult the appendix for information about non-ASCII characters in URI attribute values.
The attribute value type "color" (%Color;) refers to color definitions as specified in [SRGB]. A color value may either be a hexadecimal number (prefixed by a hash mark) or one of the following sixteen color names. The color names are case-insensitive.
Black = "#000000" | Green = "#008000" |
||
Silver = "#C0C0C0" | Lime = "#00FF00" |
||
Gray = "#808080" | Olive = "#808000" |
||
White = "#FFFFFF" | Yellow = "#FFFF00" |
||
Maroon = "#800000" | Navy = "#000080" |
||
Red = "#FF0000" | Blue = "#0000FF" |
||
Purple = "#800080" | Teal = "#008080" |
||
Fuchsia = "#FF00FF" | Aqua = "#00FFFF" |
Thus, the color values "#800080" and "Purple" both refer to the color purple.
Although colors can add significant amounts of information to documents and make them more readable, please consider the following guidelines when including color in your documents:
HTML specifies three types of length values for attributes:
Length values are case-neutral.
Note. A "media type" (defined in [RFC2045] and [RFC2046]) specifies the nature of a linked resource. This specification employs the term "content type" rather than "media type" in accordance with current usage. Furthermore, in this specification, "media type" may refer to the media where a user agent renders a document.
This type is represented in the DTD by %ContentType;.
Content types are case-insensitive.
Examples of content types include "text/html", "image/png", "image/gif", "video/mpeg", "audio/basic", "text/tcl", "text/javascript", and "text/vbscript". For the current list of registered MIME types, please consult [MIMETYPES].
Note. The content type "text/css", while not currently registered with IANA, should be used when the linked resource is a [CSS1] style sheet.
The value of attributes whose type is a language code ( %LanguageCode in the DTD) refers to a language code as specified by [RFC1766], section 2. For information on specifying language codes in HTML, please consult the section on language codes. Whitespace is not allowed within the language-code.
Language codes are case-insensitive.
The "charset" attributes (%Charset in the DTD) refer to a character encoding as described in the section on character encodings. Values must be strings (e.g., "euc-jp") from the IANA registry (see [CHARSETS] for a complete list).
Names of character encodings are case-insensitive.
User agents must follow the steps set out in the section on specifying character encodings in order to determine the character encoding of an external resource.
Certain attributes call for a single character from the document character set. These attributes take the %Character type in the DTD.
Single characters may be specified with character references (e.g., "&").
[ISO8601] allows many options and variations in the representation of dates and times. The current specification uses one of the formats described in the profile [DATETIME] for its definition of legal date/time strings ( %Datetime in the DTD).
The format is:
YYYY-MM-DDThh:mm:ssTZDwhere:
YYYY = four-digit year MM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31) hh = two digits of hour (00 through 23) (am/pm NOT allowed) mm = two digits of minute (00 through 59) ss = two digits of second (00 through 59) TZD = time zone designator
The time zone designator is one of:
Exactly the components shown here must be present, with exactly this punctuation. Note that the "T" appears literally in the string (it must be uppercase), to indicate the beginning of the time element, as specified in [ISO8601]
If a generating application does not know the time to the second, it may use the value "00" for the seconds (and minutes and hours if necessary).
Note. [DATETIME] does not address the issue of leap seconds.
Authors may use the following recognized link types, listed here with their conventional interpretations. In the DTD, %LinkTypes refers to a space-separated list of link types. White space characters are not permitted within link types.
These link types are case-insensitive, i.e., "Alternate" has the same meaning as "alternate".
User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.
Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the profile attribute of the HEAD element for more details.
For further discussions about link types, please consult the section on links in HTML documents.
The following is a list of recognized media descriptors ( %MediaDesc in the DTD).
Future versions of HTML may introduce new values and may allow parameterized values. To facilitate the introduction of these extensions, conforming user agents must be able to parse the media attribute value as follows:
media="screen, 3d-glasses, print and resolution > 90dpi"
is mapped to:
"screen" "3d-glasses" "print and resolution > 90dpi"
"screen" "3d-glasses" "print"
Note. Style sheets may include media-dependent variations within them (e.g., the CSS @media construct). In such cases it may be appropriate to use "media=all".
Script data ( %Script; in the DTD) can be the content of the SCRIPT element and the value of intrinsic event attributes. User agents must not evaluate script data as HTML markup but instead must pass it on as data to a script engine.
The case-sensitivity of script data depends on the scripting language.
Please note that script data that is element content may not contain character references, but script data that is the value of an attribute may contain them. The appendix provides further information about specifying non-HTML data.
Style sheet data (%StyleSheet; in the DTD) can be the content of the STYLE element and the value of the style attribute. User agents must not evaluate style data as HTML markup.
The case-sensitivity of style data depends on the style sheet language.
Please note that style sheet data that is element content may not contain character references, but style sheet data that is the value of an attribute may contain them. The appendix provides further information about specifying non-HTML data.
Except for the reserved names listed below, frame target names (%FrameTarget; in the DTD) must begin with an alphabetic character (a-zA-Z). User agents should ignore all other target names.
The following target names are reserved and have special meanings.