[W3C] HTML40-970708
HTML 4.0 Specification
W3C Working Draft 8-July-1997
This is: http://www.w3.org/TR/WD-html40-970708/
Abstract
This specification defines the HyperText Markup Language (HTML), version
4.0, the publishing language of the World Wide Web. In addition to the text,
multimedia, and hyperlink features of the previous versions of HTML, HTML
4.0 supports more multimedia options, scripting languages, style sheets,
better printing facilities, and documents that are more accessible to users
with disabilities. HTML 4.0 also takes great strides towards the
internationalization of documents, with the goal of making the Web truly
World Wide.
Status of this document
This is a W3C Working Draft for review by W3C members and other interested
parties. It is a draft document and may be updated, replaced or obsoleted by
other documents at any time. It is inappropriate to use W3C Working Drafts
as reference material or to cite them as other than "work in progress". This
is work in progress and does not imply endorsement by, or the consensus of,
either W3C or members of the HTML working group.
This document has been produced as part of the W3C HTML Activity, and is
intended as a draft of a proposed recommendation for HTML.
The latest version of this document can be retrieved from the list of W3C
technical reports at and is available as a gzip'ed
tar file, a zip file, as well as a postscript (about 200 pages).
We also plan to provide translations in other languages, although the
English version provides the normative specification.
HTML 4.0 replaces HTML 3.2, specified in http://www.w3.org/TR/REC-html32.
Editors
* Dave Raggett
* Arnaud Le Hors
* Ian Jacobs
Comments
Please send detailed comments on this document to www-html-editor@w3.org. We
cannot garantee a personal response but we will try when it is appropriate.
Public discussion on HTML features takes place on www-html@w3.org.
Table of Contents
1. About the HTML 4.0 Specification
2. Introduction to HTML 4.0
1. Design principles of HTML 4.0
2. Designing documents with HTML 4.0
3. A brief SGML tutorial
3. Definitions and Conventions
4. HTML and URLs - Locating resources on the Web
5. HTML Document Character Set -Character sets, character encodings, and
entities
6. Basic HTML data types -Character data, colors, and lengths
7. Structure of HTML documents - Detailed Table of Contents
1. Global structure - The HEAD and BODY of a document
2. Language information and text direction - International
considerations for text
3. Text - Paragraphs, Lines, and Phrases
4. Lists - Unordered, Ordered, and Definition Lists
5. Tables
6. Links - Hypertext and Media-Independent Links
7. Inclusions - Objects, Images, and Applets in HTML documents
8. Presentation of HTML documents - Detailed Table of Contents
1. Style Sheets - Controlling the presentation of an HTML document
2. Alignment, font styles, and horizontal rules
3. Frames - Multi-view presentation of documents
9. Interactive HTML documents - Detailed Table of Contents
1. Forms - User-input Forms: Text Fields, Buttons, Menus, and more
2. Scripts - Animated Documents and Smart Forms
10. SGML reference information for HTML - Formal definition of HTML and
validation
1. SGML Declaration
2. Document Type Definition
3. Named character entities
11. References
12. Indexes
1. Index of Elements
2. Index of Attributes
13. Appendixes
1. Changes between HTML 3.2 and HTML 4.0
2. Performance, Implementation, and Design Notes
3. HTML and Organizations (W3C, IETF, ISO)
About the HTML 4.0 Specification
Contents
1. How to read the specification
2. How the specification is organized
3. Acknowledgments
This document has been written with two types of readers in mind: HTML
authors and HTML implementors. We hope the specification will provide
authors with the tools they need to write efficient, attractive, and
accessible documents, without overexposing them to HTML's implementation
details. Implementors, however, should find all they need to build user
agents that interpret HTML correctly.
The specification has been written with two modes of presentation in mind:
electronic and printed. Although the two presentations will no doubt be
similar, readers will find some differences. For example, links will not
work in the printed version (obviously), and page numbers will not appear in
the electronic version. In case of a discrepancy, the electronic version is
considered the authoritative version of the document.
How to read the specification
The specification may be approached in several ways:
* Read from beginning to end. The specification begins with a general
presentation of HTML and becomes more and more technical and specific
towards the end. This is reflected in the specification's main table of
contents, which presents topical information, and the indexes, which
present lower level information in alphabetical order.
* Quick access to information. In order to get information about syntax
and semantics as quickly as possible, the electronic version of the
specification includes the following features:
1. Every reference to an element or attribute is linked to its
definition in the specification.
2. Every page will include links to the indexes, so you will never be
more than two links away from finding the definition of an element
or attribute.
3. The front pages of the three sections of the language reference
manual extend the initial table of contents with more detail about
each section.
How the specification is organized
This specification includes the following sections:
Section 2: Introduction to HTML 4.0.
The introduction gives an overview of what can be done with HTML 4.0.
It also provides some design tips for developing good HTML habits.
Sections 3 - 11: HTML 4.0 reference manual.
The bulk of the reference manual consists of the HTML language
reference, which defines all elements and attributes of the language.
This document has been organized by topic rather than by the grammar of
HTML. Topics are grouped into three categories: structure,
presentation, and interactivity. Although it is not easy to divide HTML
constructs perfectly into these three categories, the model reflects
the designers' experience that separating a document's structure from
its presentation produces more effective and maintainable documents.
The language reference consists of the following information:
o Conventions used by the editors of this specification.
o How HTML fits into the World Wide Web and an introduction to
related Web languages and protocols such as URLs.
o What characters may appear in an HTML document.
o Basic data types of an HTML document.
o Elements that pertain to the structure of an HTML document,
including text, lists, tables, links, and included objects,
images, and applets.
o Elements that pertain to the presentation of an HTML document,
including style sheets, fonts, colors, rules, and other visual
presentation, and frames for multi-windowed presentations.
o Elements that pertain to interactivity with an HTML document,
including forms for user input and scripts for active documents.
o The SGML definition of HTML, including the SGML declaration of
HTML, the HTML DTD, and the list of character entities.
o References.
Section 12: Quick reference indexes.
Two indexes give readers rapid access to the definition of all elements
and attributes The indexes also summarize some key characteristics of
each element and attribute.
Section 13: Appendixes.
The appendix contains information about changes from HTML 3.2,
performance and implementation notes, and how W3C and other
organizations interact with respect to HTML.
Acknowledgments
Thanks to everyone who has helped to author the working drafts that went
into the HTML 4.0 specification, and all those who have sent suggestions and
corrections. A particular thanks to T.V. Raman for his work on improving the
accessibility of HTML forms for people with disabilities.
The authors of this specification, the members of the W3C HTML Working
Group, deserve much applause for their diligent review of this document,
their constructive comments, and their hard work: John D. Burger, Steve
Byrne, Martin J. Dürst, Daniel Glazman, Scott Isaacs, Murray Maloney, Steven
Pemberton, Jared Sorensen, Powell Smith, Robert Stevahn, Ed Tecot, Jeffrey
Veen, Mike Wexler, Misha Wolf, and Lauren Wood.
Thank you Dan Connolly for thoughtful input and guidance as chairman of the
HTML working group. Thank you Sally Khudairi for your indispensible work on
the press release.
Of particular help from the Inria at Sophia-Antipolis were Janet Bertot,
Bert Bos, Stephane Boyera, Daniel Dardailler, Yves Lafon, Hċkon Lie, Chris
Lilley, and Colas Nahaboo.
Lastly, thanks to Tim Berners-Lee without whom none of this would have been
possible.
Introduction to HTML 4.0
Contents
This is being written ...
Design principles of HTML 4.0
As you read the specification, you may find it enlightening to keep in mind
the following principles that guided the design of HTML 4.0.
* Interoperability
While most people agree that HTML documents should work well across
different browsers and platforms, achieving interoperability implies
higher costs to content providers since they must develop different
versions of documents. If the effort is not made, however, there is
much greater risk that the Web will devolve into a proprietary world of
incompatible formats, ultimately reducing the Web's commercial
potential for all participants.
Each version of HTML attempts to reach greater consensus among industry
players so that the investment made by content providers will not be
wasted and that their documents will not become unreadable in a short
period of time.
HTML has been developed with the vision that all manner of devices
should be able to use information on the Web: PCs with graphics
displays of varying resolution and color depths, cellular telephones,
hand held devices, devices for speech for output and input, computers
with high or low bandwidth, and so on.
* Internationalization
This version of HTML has been designed with the help of experts in the
field of internationalization, so that documents may be written in
every language and be transported easily around the world. This has
been accomplished by incorporating [RFC2070], which deals with the
internationalization of HTML.
One important step has been the adoption of the ISO/IEC:10646 standard
(see [ISO10646]) as the document character set for HTML. This is the
world's most inclusive standard dealing with issues of the
representation of international characters, text direction,
punctuation, and other world language issues.
HTML now offers greater support for diverse human languages within a
document. This allows for more effective indexing of documents for
search engines, higher-quality typography, better text-to-speech
conversion, correct hyphening, etc.
* Accessibility
As the Web community grows and its members diversify in their abilities
and skills, it is crucial that the underlying technologies be
appropriate to their specific needs. HTML has been designed to make Web
pages more accessible to those with physical limitations. HTML 4.0
developments in the area of accessibility include:
o Encouraging the use of style sheets (rather than tables) to
achieve layout effect.
o Making it easier to provided alternate (textual and aural)
descriptions of images for non-visual browsers.
o Providing active labels for form fields
o Providing labeled hierarchical groupings for form fields.
o Providing the ability to associate a longer text description with
an HTML element.
Authors who design pages with accessibility issues in mind will not
only receive the blessings of the accessbility community, but will
benefit in other ways as well: well-designed HTML documents that
distinguish structure and presentation will adapt more easily to new
technologies.
* Tables
The new table model in HTML is based on [RFC1942]. Authors now have
greater control over structure and layout (e.g., column groups). The
ability of designers to recommend column widths allows user agents to
display table data incrementally (as it arrives) rather than waiting
for the entire table before rendering.
* Compound documents
HTML now offers a standard mechanism for embedding generic media
objects and applications in HTML documents. The OBJECT element
(together with its more specific ancestor elements IMG and APPLET)
provides a mechanism for including images, video, sound, mathematics,
specialized applications, and other objects in a document. It also
allows authors to specify a hierarchy of alternate renderings for user
agents that don't support a specific rendering.
* Style sheets
Style sheets simplify HTML markup and largely relieve HTML of the
responsibilities of presentation. They give both authors and users
control over the presentation of documents --- font information,
alignment, colors, etc.
Stylistic information can be:
o Attached to a specific element to affect, say the color or font of
its content.
o Placed in the document header as a series of styles comprising a
style sheet
o Linked to an HTML from an external style sheet.
The mechanism for associating a style sheet with a document is
independent of the style sheet language.
* Scripting
Through scripts, authors may create "smart forms" that react as users
fill them out. Scripting allows designers to create dynamic Web pages,
and to use HTML as a means to build networked applications. The
mechanisms provided to associate HTML with scripts are independent of
particular scripting languages.
* Printing
HTML features allow user agents to print a collection of documents in
an intelligent manner based on descriptions of the relationships among
documents acting as parts of a larger work.
* Ease of use
This version of HTML has been designed to remain easy to learn and
adequate for many common publishing needs. The language offers more
complex constructs (e.g., forms, scripting) for more sophisticated
tasks, but even these mechanisms will become easier to use as powerful
HTML authoring tools flourish.
Beware - at the time of writing, some HTML authoring tools rely
extensively on tables for formatting, which may easily cause
accessibility problems.
Designing documents with HTML 4.0
General principles for good HTML design and implementation include:
* Separate structure and presentation
HTML has its roots in SGML which has always been a language for the
specification of structural markup. As HTML matures, more and more of
its presentational elements and attributes are being replaced by other
mechanisms, in particular style sheets. Experience has shown that
separating the structure of a document from its presentational aspects
reduces the cost of serving a wide range of platforms, media, etc., and
facilitates document revisions.
* Consider universal accessibility to the Web
To make the Web more accessible to everyone, notably those with
disabilities, authors should consider how their documents may be
rendered on a variety of platforms: speech-based browsers,
braille-readers, etc. We do not recommend that designers limit their
creativity, only that they consider alternate renderings in their
design. HTML offers a number of mechanisms to this end (e.g., the alt
attribute, the accesskey attribute, etc.)
Furthermore, authors should keep in mind that their documents may be
reaching a far-off audience with different computer configurations. In
order for documents to be interpreted correctly, designers should
include in their documents information about the language and direction
of the text, how the document is encoded, and other issues related to
internationalization.
* Help user agents with incremental rendering
By carefully designing their tables and making use of new table
features in HTML 4.0, designers can help user agents render documents
more quickly.
A brief SGML tutorial
Contents
1. About SGML
2. HTML syntax
1. Entities
2. Elements
3. Attributes
4. HTML comments
3. How to read the HTML DTD
1. Block level and Inline elements
2. DTD Comments
3. Entity Definitions
4. Element definitions
5. Attribute definitions
This section of the document presents introductory information about SGML
and its relationship to HTML. It discusses:
* HTML syntax: How to write elements, attributes, and comments.
* The HTML DTD: How to read the HTML DTD.
About SGML
The Standard Generalized Markup Language (SGML, defined in [ISO8879]), is a
language for defining markup languages. HTML is one such "application" of
SGML.
An SGML application consists of several parts:
1. The SGML declaration. The SGML declaration specifies which characters
and delimiters may appear in the application.
2. The document type definition (DTD). The DTD defines the syntax of
markup constructs. The DTD may include additional definitions such as
numeric and named character entities.
3. A specification that describes the semantics to be ascribed to the
markup. This specification also imposes syntax restrictions that cannot
be expressed within the DTD.
4. Document instances containing data (contents) and markup. Each instance
contains a reference to the DTD to be used to interpret it.
The SGML declaration for HTML 4.0 and the DTD for HTML 4.0 are included in
this reference manual, along with the entity sets referenced by the DTD.
HTML syntax
In this section, we discuss the syntax of HTML elements, attributes, and
comments.
Entities
Character entities are numeric or symbolic names for characters that may be
included in an HTML document. They are useful when your authoring tools make
it difficult or impossible to enter a character you may not enter often. You
will see character entities throughout this document; they begin with a "&"
sign and end with a semi-colon (;).
We discuss HTML character entities in detail later in the section on the
HTML document character set.
Elements
An SGML application defines elements that represent structures or desired
behavior. An element typically consists of three parts: a start tag,
content, and an end tag.
A element's start tag is written , where element-name is the
name of the element. An element's end tag is written with a slash before the
element name: . For example,
The content of the PRE element is preformatted text.
The SGML definition of HTML specifies that some HTML elements are not
required to have end tags. The definition of each element in the reference
manual indicates whether it requires an end tag.
Some HTML elements have no content. For example, the line break element BR
has no content; its only role is to terminate a line of text. Such "empty"
elements never have end tags. The definition of each element in the
reference manual indicates whether it is empty (has no content) or, if it
can have content, what is considered legal content.
Element names are always case-insensitive.
Elements are not tags. Some people refer incorrectly to elements as tags
(e.g., "the P tag"). Remember that the element is one thing, and the tag (be
it start or end tag) is another. For instance, the HEAD element is always
present, even though both start and end HEAD tags may be missing in the
markup.
Attributes
Elements may have associated properties, called attributes, to which authors
assign values. Attribute/value pairs appear before the final ">" of an
element's start tag. Any number of (legal) attribute value pairs, separated
by spaces, may appear in an element's start tag. They may appear in any
order.
In this example, the align attribute is set for the H1 element:
This is a centered heading thanks to the align attribute
By default, SGML requires you to delimit all attribute values using either
double quotation marks (") or single quotation marks ('). Single quote marks
can be included within the attribute value when the value is delimited by
double quote marks, and vice versa. You may also use numeric character
entities to represent double quotes (") and single quotes ('). For
double quotes you can also use the named character entity ".
In certain cases, it is possible in HTML to specify the value of an
attribute without any quotation marks. The attribute value may only contain
letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), and periods
(ASCII decimal 46). We suggest using quotation marks even when it is
possible to eliminate them.
Attribute names are always case-insensitive.
Attribute values are generally case-insensitive. The definition of each
attribute in the reference manual indicates whether its value is
case-insensitive.
Note: HTML documents may compress better if you use lower case letters for
element and attribute names. The reason is that the compression algorithms
do a better job for more frequently repeated patterns, and lower case
letters are more frequent than upper case ones.
HTML comments
HTML comments have the following syntax:
Comments must not be rendered by user agents as part of a document.
Similary, user agents must not render SGML processing instructions (e.g.,
).
How to read the HTML DTD
This specification presents pertinent fragments of the DTD each time an
element or attribute is defined. Though cryptic and dissuasive at first, the
DTD fragment gives concise information about an element and its attributes.
We have chosen to include the DTD fragments in the specification rather than
seek a more approachable, but longer and less precise means of describing an
element. While almost all of the definitions include enough English text to
make them comprehensible, for those who require definitive information, we
complete this specification with a brief tutorial on reading the HTML DTD.
Block level and Inline elements
Certain HTML elements are said to be "block level" while others are "inline"
(also known as "text level"). The distinction is founded on several notions:
Content model
Generally, block level elements may contain inline elements and other
block level elements. Generally, inline elements may generally contain
only data and other inline elements. Inherent in this structural
distinction is the idea that block elements create "larger" structures
than inline elements.
Formatting
By default, block level are formatted differently than inline elements.
Block level elements generally begin on new lines, inline elements
generally do not. Block level elements end an unterminated paragraph
element. This enables you to omit end-tags for paragraphs in many
cases.
Directionality
For technical reasons involving the [UNICODE] bidirectional text
algorithm, block level and inline elements differ in how they inherit
directionality information. For details, see the section on inheritance
of text direction.
Style sheets provide the means to specify the rendering of arbitrary
elements, including whether an element is rendered as block or inline. In
some cases, such as an inline style for list elements, this may be
appropriate, but generally speaking, authors are discouraged from overriding
the conventional interpretation of HTML elements in this way.
The alteration of the traditional presentation idioms for block level and
inline elements also has an impact on the bidirectional text algorithm. See
the section on the effect of style sheets on bidirectionality for more
information.
DTD Comments
In DTDs, comments may spread over one or more lines. In the DTD, comments
are delimited by a pair of "--" marks, e.g.
Here, the comment "named property value" explains the use of the PARAM
element. DTD comments for HTML do have not normative value.
Entity Definitions
The HTML DTD begins with a series of entity definitions. An entity
definition (not to be confused with an SGML entity) defines a kind of macro
that may be expanded elsewhere in the DTD. When the macro is referred to by
name in the DTD, it is expanded into a string.
An entity definition begins with the keyword . The following example defines the string that the %font entity will
expand to.
The string the entity expands to may contain other entity names. These names
are expanded recursively. In the following example, the %inline entity is
defined to include the %font, %phrase, %special and %formctrl entities.
You will encounter two DTD entities frequently in the HTML DTD: %inline and
%block. They are used when the content model includes inline and block level
elements respectively.
Element definitions
The bulk of the HTML DTD consists of the definitions of elements and their
attributes. The keyword begins an element definition and the >
character ends it. Between these are specified:
1. The element's name.
2. Whether the element's end tag is optional. Two hyphens that appear
after the element name mean that the start and end tags are mandatory.
One hyphen followed by the letter "O" (not zero) indicates that the end
tag can be omitted. A pair of letter "O"s indicate that both the start
and end tags can be omitted.
3. The element's content, if any. The allowed content for an element is
called its content model. Elements with no content are called empty
elements. Empty elements are defined with the keyword "EMPTY".
In this example:
* The element being defined is UL.
* The two hyphens indicate that both the start tag and the end tag for
this element are required.
* The content model for this element defined to be "at least one LI
element". We describe content models in detail below.
This example illustrates the definition of an empty element:
* The element being defined is IMG.
* The hyphen and the following "O" indicate that the end tag can be
omitted, but together with the content model "EMPTY", this is
strengthened to the rule that the end tag must be omitted.
* The "EMPTY" keyword means the element must not have content.
Content model definitions
The content model describes what may be contained by an element. Content
definitions may include:
* The names of allowed or forbidden elements (e.g., the UL element
includes instances of the LI element).
* DTD entities (e.g., the LABEL element includes instances of the %inline
entity).
* Document text (indicated by the SGML construct "#PCDATA"). Text may
contain numeric and named character entities. Recall that these begin
with & and end with a semicolon (e.g., "Hergé's adventures
of Tintin" includes the named entity for the "acute e" character).
The content model use the following syntax to define what markup is allowed
for the content of the element:
( ... )
Specifies a group.
A | B
Both A and B are permitted in any order.
A , B
A must occur before B.
A & B
A and B must both occur once, but may do so in any order.
A?
A can occur zero or one times
A*
A can occur zero or more times
A+
A can occur one or more times
Here are some examples from the HTML DTD:
The SELECT element must contain one or more OPTION elements.
The DL element must contain one or more DT or DD elements in any order.
The OPTION element may only contain text and entities, such as &
A few HTML elements use an additional SGML feature to exclude certain
elements from content model. Excluded elements are preceded by a hyphen.
Explicit exclusions override inclusions.
In this example, the -(A) signifies that the element A cannot be included in
another A element (i.e., anchors may not be nested).
Note that the A element is part of the DTD entity %inline, but is excluded
explicitly because of -(A).
Similarly, the following element definition for FORM prohibits nested forms:
Attribute definitions
The keyword begins the definition of attributes that an element
may take. It is followed by the name of the element in question and a list
of attribute definitions. An attribute definition is a triplet that defines:
* The name of an attribute.
* The type of the attribute's value or an explicit set of possible
values. Values defined explicitly by the DTD are case-insensitive.
* Whether the default value of the attribute is implicit (keyword
"#IMPLIED"), in which case the default value must be supplied by the
user agent (in some cases via inheritance from parent elements); always
required (keyword "#REQUIRED"); or fixed to the given value (keyword
"#FIXED"). Some attributes explicitly specify a default value for the
attribute.
In this example, the name attribute is defined for the MAP element. The
attribute is optional for this element.
The type of values permitted for the attribute is given as CDATA, an SGML
data type. CDATA is text that may include character entities.
For more information about "CDATA", "NAME", "ID", and other data types,
please consult the section on HTML data types.
The following examples illustrate possible attribute definitions:
rowspan NUMBER 1 -- number of rows spanned by cell --
http-equiv NAME #IMPLIED -- HTTP response header name --
id ID #IMPLIED -- document-wide unique id --
valign (top|middle|bottom|baseline) #IMPLIED
The rowspan attribute requires values of type NUMBER. The default value is
given explicitly as "1". The optional http-equiv attribute requires values
of type NAME. The optional id attribute requires values of type ID. The
optional valign attribute is constrained to take values from the set {top,
middle, bottom, baseline}.
DTD entities in attribute definitions
Attribute definitions may also include DTD entities.
In this example, we see that the attribute definition list for the LINK
element begins with the %attrs entity.
The %attrs entity expands to:
The %attrs entity has been defined for convenience since these seven
attributes are defined for most HTML elements.
Simiarly, the DTD defines the %URL entity as expanding into the string
CDATA.
As this example illustrates, the entity %URL provides readers of the DTD
with more information as to the type of data expected for an attribute.
Similar entities have been defined for %color, %Content-Type, %Length,
%Pixels, etc.
Boolean attributes
Some attributes play the role of boolean variables (e.g., selected). Their
appearance in the start tag of an element implies that the value of the
attribute is "true". Their absence implies a value of "false".
Boolean attributes may legally take a single value: the name of the
attribute itself (e.g., selected="selected").
This example defines the selected attribute to be a boolean attribute.
selected (selected) #IMPLIED -- reduced interitem spacing --
The attribute is set to "true" by appearing in the element's start tag: