TERMINAL GRAPHICS FOR UNICODE Frank da Cruz The Kermit Project Columbia University New York City http://www.columbia.edu/kermit/ D R A F T # 1 Wed Sep 30 21:15:31 1998 THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP A CLEAN COPY AT: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt PLEASE SEND COMMENTS AND SUGGESTIONS TO THE AUTHOR AT: fdc@columbia.edu ABSTRACT A selection of terminal graphics characters is proposed for Unicode [24] and ISO 10646 [19] to allow Unicode-based terminal emulation software to (a) display glyphs that are found on popular types of terminals but currently are not available in Unicode, and (b) interoperate with other Unicode applications. CONTENTS 1. Introduction 2. Scope 3. Organization 4. Graphic Representation of Control Characters 5. Hex Bytes 6. Math Symbols 7. Line and Box Drawing Characters 8. Miscellaneous Single-Cell Glyphs 9. Unfinished Business 10. Summary of Proposed Additional Characters 11. References Tables: 4.1. IBM PC Code Page 437 C0 Graphics 4.2. C0 Control Pictures 4.3. C1 Control Pictures 4.4. 3270 Control Pictures 4.5. EBCDIC Control Pictures 4.6. Additional Control-Like Pictures 6.1. Supplementary Math Symbols 7.1. Additional Line, Box, and Block Characters 8.1. Miscellaneous Single-Cell Terminal Glyphs 10.1. Census of New Characters Figures: 4.1. Control Picture Display 5.1. Hex Byte Pictures 7.1. "Framus" Glyphs 1. INTRODUCTION Terminal-host communication was the dominant form of interaction between human and computer from about 1974 (when CRTs became affordable) to about 1994 (when the Web and Windows took over the mass market). Terminal-host communication is still widespread, especially in large organizations, and is expected to remain so for decades to come, playing an important part in organizations like universities, hospitals, and government agencies, as well as corporations, with central computing facilities, for use in applications ranging from sofware development and system/network administration, to email and text-based Web access, to data entry and inquiry, to transaction processing. A terminal, for purposes of this document, is a device for entry and display of text in a fixed-pitch font on a screen (or on paper) in which characters are displayed in rows and columns of fixed size "cells". Terminals generally display the characters of ASCII [1] or EBCDIC [13], and sometimes also accented or non-Roman letters (or ideograms), and often also "graphic" (non-alphabetic, non-digit, non-punctuation) characters for purposes of line- and box-drawing, mathematics, or other special effects. In recent years, physical terminals have largely disappeared from the scene, their functions subsumed into PCs running terminal-emulation software alongside other applications. Unicode has effectively met the need for encoding the earth's writing systems, but it is not well suited to terminal emulation since it lacks some of the required graphics characters. Without a standard encoding for the missing glyphs, each maker of terminal emulation software must create or contract for custom fonts with private encodings. Such fonts are not compatible with other (otherwise compatible) fonts on the same platform (e.g. when copying and pasting between applications), nor with each other. Furthermore, should Unicode printers become standard equipment on PCs, terminal graphics characters will not print correctly on them. This document proposes a modest repertoire of terminal graphics characters to be added to Unicode and ISO 10646, with specific encoding to be decided by the UTC or other appropriate body, that all makers of fonts, code pages, and printers can refer to in designing their products, and upon which all makers of terminal emulation software can base their screen displays. For best results, this project should be a cooperative effort among those who care about both terminal emulation (and emulation of particular terminals) and the Universal Character Set. Unfortunately, in many cases the actual owners or creators of the original terminal character sets in question are no longer available for consultation. 2. SCOPE This document represents a survey of the following terminals: Digital Equipment Corporation VT100 through VT520 [3-9] Heath / Zenith 19 [10] Hewlett Packard HP-2621 and HP-2648 [11,12]] IBM 3164 and 3270 [15,16] Siemens Nixdorf 97801 [21] Televideo 922 and 965 [22,23] Wyse 60 and 370 [25,26] as well as: IBM PC code page 437 [14] which is the basis for numerous PC-oriented so-called ANSI emulations. Even within this fairly narrow scope, the task of settling on a set of character-cell terminal graphics for Unicode is complicated by the well-known problems that affect other preexisting character sets to varying degrees: 1. Lack of official names for the characters. 2. Lack of definitive, high-quality pictures of the glyphs. 3. Lack of descriptions of the purpose and intended use of the glyphs. 4. Lack of a current registration authority or owner. 5. Questions of unification of glyphs from different terminal makers. 6. End-user demand for specific characters or sets. The issue of unification is complicated by the fact that many of the terminal graphics characters are designed to join at cell boundaries to form "pictures" (such as boxes or forms to be filled out) or large characters (such as big math symbols) spanning multiple rows and/or columns. The relationship of similar-looking glyphs for different terminals is difficult to determine -- e.g. exactly where does a line touch an edge, and at what angle, and does it make a difference? In linguistic terms, which glyphs may be considered allographs, and which are distinct graphemes? This proposal does not require any action for well-known terminal presentation forms such as double-high and/or double-wide characters, bold, blinking, inverse, underlining, color, etc, since these are not encoding issues. In particular, no special code points are needed for double-high or double-wide characters, such as those seen on the DEC VT100 family of terminals, nor for compressed characters as seen on Data General and DEC terminals. This proposal also does not cover true graphics terminals, such as Tektronix vector graphics units, DEC ReGIS or Sixel graphics, etc, since these graphics regimes are not character-cell based. Note that the graphic characters listed in this proposal rarely, if ever, appear on keyboard key labels. In general, these characters are never typed, not even on real terminals, but are displayed when the terminal is commanded into a special mode; for example, with ISO 2022 [17] character-set designation and invocation escape sequences. 3. ORGANIZATION This proposal groups terminal graphic characters into four major categories. Some categories are complete by definition (e.g. the 2-nibble hex codes, of which there can be only 256), but others should include space for expansion as new glyphs are discovered or needed. The categories are: Debugging Tools Graphical single-cell representation of C0 and C1 control characters; hexadecimal dumps of terminal traffic, etc. Math Symbols Although most math symbols found on terminals are already in Unicode, certain terminal-based applications rely on the ability to construct large symbols (integral and summation signs, braces, brackets) from smaller character-cell-sized pieces. Line and Box Drawing Used for data entry, transaction processing, forms filling, etc, in markets ranging from car rental and airline reservations, to medical information systems, to online library catalogs. Although Unicode does include a basic set (mainly those as U+2500), some others are missing. Other Miscellaneous Character-Cell graphics. Padlocks, stick-figure people, etc, e.g. to indicate the state of the keyboard and/or host application, as well as mosaic graphics cells, and assorted pictures and dingbats. This document lists the terminal graphics characters for the terminals in Section 2, to suggest unifications, and to assigns preliminary, temporary Unicode values from the Private Use area: E000-E08F Control Pictures E0A0-E0CF Math Symbols E0D0-E0EF Line and Box Drawing E0F0-E0FF Miscellaneous single-cell graphic characters E100-E1FF Hex Bytes For a total of 512 positions, not fully populated. Obviously the final counts, code values, and block allocations, including reserved positions, are likely to change as this proposal evolves. All new characters proposed in this document should be precomposed, since no terminals (with the exception of certain APL and ALA terminals) are capable of composing characters on the fly from nonspacing diacritics or by overstriking. 4. GRAPHIC REPRESENTATION OF CONTROL CHARACTERS Several methods are available for "printing" control characters. First, there is the de facto standard collection of dingbats in the 0x00-0x1F range of IBM PC Code Page 437 [14]. As shown in Table 4.1, this is already adequately covered by Unicode (in which "Code" is the Unicode value and "IBM" is the IBM Code page value, both hexadecimal). Table 4.1: IBM PC Code Page 437 C0 Graphics Code IBM Unicode Name Code IBM Unicode Name 00A0 00 Blank 25BA 10 Black right-pointing pointer 263A 01 White smiling face 25C4 11 Black left-pointing pointer 263B 02 Black smiling face 2195 12 Up down arrow 2665 03 Black heart suit 203C 13 Double exclamation mark 2666 04 Black diamond suit 00B6 14 Pilcrow sign 2663 05 Black club suit 00A7 15 Section sign 2660 06 Black space suit 25AC 16 Black rectangle 2022 07 Bullet 21A8 17 Up down arrow with base 25D8 08 Inverse bullet 2191 18 Upwards arrow 25EF 09 Large circle 2193 19 Downwards arrow 25D9 0A Inverse white circle 2192 1A Rightwards arrow 2642 0B Male sign 2190 1B Leftwards arrow 2640 0C Female sign 2319 1C Turned not sign 266A 0D Eighth note 2194 1D Left right arrow 266C 0E Beamed 16th notes 25B2 1E Black up-pointing triangle 263C 0F White sun with rays 25BC 1F Black down-pointing triangle (Note that "black" and "white" are used in accordance Unicode terminology, where they denote the presence or absence of (black) ink on the page; however, any colors at all can appear on a terminal screen.) More useful in a terminal emulator, however, is the ability to display the the official abbreviation [1,18], or "name", of the control character in a single cell, as is done by numerous terminals, as well as by data analyzers and line monitors, which themselves also tend to be increasingly implemented in software on PCs. Some control characters have two-character abbreviations (such as CR, LF, HT, FF), while others are three characters (NUL, SOH, DC1, DLE). Some terminals compress three-letter abbreviations to the two-character forms shown in Table 4.2. All terminals, however, display the abbreviations diagonally in the character cell, as shown in Figure 4.1. Figure 4.1: Control Picture Display +---+ +---+ |L | |D | (except the two-character abbreviation appears on the | | | C | screen with the characters closer together) | F| | 1| +---+ +---+ Unicode already has a block of Control Pictures at U+2400 through U+2421, but (except for "NL" at U+2424) these go horizontally across the character cell, rather than diagonally, thus making them difficult to distinguish from normal alphanumeric text. A new, parallel block of C0 control pictures is needed in which the abbreviations are displayed diagonally. These are listed in Table 4.2, in which "Code" is the temporary Unicode value, "Name" is the official (ASCII) abbreviation (and the one used in the Display Controls character set of the VT220 family [5]), and "2X" is the 2-character abbreviation (used in the Display Controls font of Televideo [22,23], HP [11], Perkin Elmer [20], and other terminals). Table 4.2: C0 Control Pictures Code Name 2X Code Name 2X E000 NUL NU E010 DLE DL E001 SOH SH E011 DC1 D1 E002 STX SX E012 DC2 D2 E003 ETX EX E013 DC3 D3 E004 EOT ET E014 DC4 D4 E005 ENQ EQ E015 NAK NK E006 ACK AK E016 SYN SY E007 BEL BL E017 ETB EB E009 BS BS E018 CAN CN E009 HT HT E019 EM EM E00A LF LF E01A SUB SU E00B VT VT E01B ESC EC E00C FF FF E01C FS FS E00D CR CR E01D GS GS E00E SO SO E01E RS RS E00F SI SI E01F US US There is little to gain by defining separate 2- and 3-character glyphs for control characters that have 3-character names; therefore it is suggested that the full abbreviation (from the Name column) be used, with the characters arranged diagonally within each cell (rather than horizontally as in the U+2400 block), and that the 2X column be ignored. C1 Control characters are specified in ISO-6429 and used in the VT220 family of terminals [5] and the Wyse 370 [26], where they are represented in the right half of the "display controls" font as shown in Table 4.3 (DEC terminals use the full name, Wyse terminals use the 2X name). As with C0 controls, the "name" is displayed diagonally within the character cell. Unicode presently includes no C1 control pictures. Table 4.3: C1 Control Pictures Code Name 2X Code Name 2X 80 (1) E030 DCS DC 81 (1) E031 PU1 P1 E022 BPH (2) E032 PU2 P2 E023 NBH (2) E033 STS SE E024 IND IN (3) E034 CCH CC E025 NEL NL E035 MW MW E026 SSA SS E036 SPA SP E027 ESA ES E037 EPA EP E028 HTS HS E038 SOS (2) E029 HTJ HJ 99 (1) E02A VTS VS E03A SCI (2) E02B PLD PD E03B CSI CS E02C PLU PU E03C ST ST E02D RI RI E03D OSC OS E02E SS2 S2 E03E PM PM E02F SS3 S3 E03F APC AP Notes; (1) Undefined in ISO-6428, shown on VT220/WY370 terminal by hex value. (2) Defined in ISO-6428, but shown on VT220/WY370 terminal by hex value. (3) Undefined in ISO-6428, but shown indicated on VT220/WY370 terminal. Note that three of the C1 control pictures are unassigned (the ones marked by "(1)", that would be at U+E020, U+E021, and U+E039 if these were assigned). These positions should be left vacant in case names are assigned to these characters in a future revision of ISO 6429. As with C0 controls, it is presumed acceptable to encode the full abbreviation, without the 2-character alternatives for 3-character forms. Table 4.4 shows the names of control characters unique to EBCDIC (that is, the ones it does not share with ASCII). Table 4.4: EBCDIC Control Pictures Code Name Description E040 PF Punch Off E041 PN Punch On E042 LC Lower Case E043 UC Upper Case E044 SMM Start of Manual Message E045 TM Tape Mark E046 RES Restore E047 IL Idle E048 CC Cursor Control E049 CU1 Customer Use 1 E04A CU2 Customer Use 2 E04B CU3 Customer Use 3 E04C CU4 Customer Use 4 E04D IFS Interchange File Separator E04E IGS Interchange Group Separator E04F IUS Interchange Unit Separator E050 DS Digit Select E051 SOS Start of Significance E051 BYP Bypass E053 SM Set Mode Names for IBM 3270 terminal Orders, LU 1 SCS Control Codes, and Format Control Orders, which are not already listed as ASCII or EBCDIC control codes, are shown in Table 4.5, to be used in debugging 3270 data streams. Table 4.5: 3270 Control Pictures Code Name Description E060 VCS Vertical Channel Select E061 GE Graphics Escape E062 ENP Enable Presentation E063 IRS Interchange Record Separator E064 INP Inhibit Presentation E065 SA Set Attribute E066 FMT Format E067 TRN Transparent E068 SF Start Field E069 SFE Start Field Extended E06A SBA Set Buffer Address E06B MF Modify Field E06C PT Program Tab E06D RA Repeat to Address E06E EUA Erase to Unprotected Address E06F DUP Duplicate E070 FM Field Mark E071 EO Eight Ones Table 4.6 shows additional characters that may be included in "display controls" mode on various terminals. Table 4.6: Additional Control-Like Pictures Code Name Remarks E080 SP Space (like U+2420 but arranged diagonally) E081 DEL Delete (Rubout) (2-character name: DT) E082 LS1 Locking Shift 1 (ISO name for SO) E083 LS0 Locking Shift 0 (ISO name for SI) E084 IS4 ISO Name for FS: Information Separator 4 E085 IS3 ISO Name for GS: Information Separator 3 E086 IS2 ISO Name for RS: Information Separator 2 E087 IS1 ISO Name for US: Information Separator 1 E088 CL Clear or Cancel Line (used on HP terminals) E089 BP From the Data General Word Processing Set E08A BE From the Data General Word Processing Set E08B FN From the Data General Word Processing Set E08C FE From the Data General Word Processing Set E08D HF From the Data General Word Processing Set E08E Diagonal crosshatches (1) E08F Picture of Bell (used on HP-2621 to show BEL, 0x07) 2422 Blank symbol (substitute blank, b with stroke) (2) 2423 Blank symbol (open box) (2) 2424 NL DEC Special Graphics 0x68, EBCDIC control New Line (2) Notes: (1) Used for DEL on Televideo, HP. Similar to U+25A9, but without border. (2) Already in Unicode. Summary: 115 new characters required for graphic representation of control characters. Range: U+E000 through U+E09F, 160 positions with 45 vacant for expansion. 5. HEX BYTES Hexadecimal byte values, 2 hex digits each. Like display controls, but for all 256 8-bit byte values, showing the byte code in hexadecimal, rather than the (context-dependent) name. For hex debugging (in terminal emulators, line monitors, protocol analyzers, etc). Should be arranged diagonally within the character cell as shown in Figure 5.1: Figure 5.1: Hex Byte Pictures +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ |0 | |0 | |0 | ... |0 | |1 | |1 | |1 | ... |E | |F | ... |F | | 1| | 2| | 3| | F| | 0| | 1| | 2| | F| | 0| | F| +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ One glyph is required for each hex byte code 00 through FF, or 256 glyphs in all. Suggested temporary codes: U+E100 through U+E1FF. Note that the SNI "IBM" character set contains glyphs for 01 through 1F, which are shown sideways. I see no reason to encode these separately, but others might disagree. Summary: 256 new characters, U+E100 through U+E1FF. 6. MATH SYMBOLS Unicode has a generous supply of math symbols, and no doubt more are in the works. And of course it also includes the Latin, Greek, Fraktur, Hebrew, and other letters used in mathematical notation. However, terminal emulators also need special glyphs designed to be joined together in adjacent character cells, vertically or horizontally, to form large math symbols such as integrals, summation signs, braces, or brackets, such as the integral top and bottom that already exist at U+2320 and U+2321. Several other single-cell characters are also missing, including the small radical sign from the DEC Technical character set. Table 6.1 lists the needed characters, along with suggested temporary codes for them. At least one real terminal reference is shown for each character, in column/row notation, or an IBM Graphic Character Global Identifier (GCGID) [14]. Note: SB stands for Square Bracket. Table 6.1: Supplementary Math Symbols Code Description Reference E0A0 Extensible left brace middle DEC Tech 02/15 E0A1 Extensible left parenthesis bottom DEC Tech 02/12, IBM SS210000 E0A2 Extensible left parenthesis top DEC Tech 02/11, IBM SS200000 E0A3 Extensible left SB bottom DEC Tech 02/08 E0A4 Extensible left SB top DEC Tech 02/07 E0A5 Extensible right brace middle DEC Tech 03/00 E0A6 Extensible UR or LL brace section IBM SS240000 E0A7 Extensible LR or UL brace section IBM SS250000 E0A8 Extensible right parenthesis bottom DEC Tech 02/14, IBM SS230000 E0A9 Extensible right parenthesis top DEC Tech 02/13, IBM SS220000 E0AA Extensible right SB bottom DEC Tech 02/10 E0AB Extensible right SB top DEC Tech 02/08 E0AC Summation symbol bottom DEC Tech 03/02, DG Math 01/09(1) E0AD Summation symbol top DEC Tech 03/01, DG Math 01/08(1) E0AE Right ceiling corner DEC Tech 03/05 E0AF Right floor corner DEC Tech 03/06 E0B0 Radical symbol, small DEC Tech 00/01 E0B1 Radical symbol with stroke DG Math 01/13 E0B2 Superscript Latin small letter i SNI Math 03/00 E0B3 Latin small letter a with underbar SNI Math 04/04 (2) E0B4 Latin capital letter H with bar SNI Math 04/05 (2) E0B5 Latin small letter h with bar SNI Math 04/06 (2) E0B6 Latin capital letter L with dot SNI Math 04/07 (2) E0B7 Latin small letter L with dot SNI Math 04/08 (2) E0B8 Latin capital letter O with underbar SNI Math 04/09 (2) E0B9 Latin small letter t with bar SNI Math 04/10 (2) E0BA Latin small script letter t with bar SNI Math 04/12 (2) E0BB ??? SNI Math 04/11 (3) E0BC ??? SNI Math 04/11 (3) E0BD ??? SNI Math 04/11 (3) E0BE Superscript almost-equal-to sign SNI IBM 06/12 E0BF Superscript capital Greek letterSigma SNI IBM 06/13 E0C0 Superscript infinity sign SNI IBM 07/12 E0C1 Superscript proportional-to sign SNI IBM 07/13 References: DEC Tech = Digital Equipment Corporation Technical Character Set [5] SNI Math = Siemens Nixdorf Mathematisch [21] DG Math = Data General Word-Processing, Greek, and Math Character Set [2] IBM = IBM Graphic Character Global Identifier (GCGID) [14] Notes: (1) Also GCGID SS280000 and SS29000. (2) I'm not too sure about some of the SNI symbols. I'm only guessing at what the pictures (in the SNI 97801 manual) are supposed to mean; there are no accompanying character names or text. (3) These look like permutations of lowercase Latin letter n with hook (small eng), in various sizes, with or without a vertical accent mark on top. It's not clear to me whether these can be unified with any existing Unicode characters. As far as I can tell, none of the SNI letterforms listed above are in Unicode 2.0. Summary: 34 new characters, Range E0A0-E0CF, with 14 positions left vacant. 7. LINE, BOX, AND BLOCK CHARACTERS A particular need addressed by this proposal is the continued ability to support (sometimes mission-critical) terminal-based forms-filling applications that also require entry and display of international characters, as terminals are replaced by PCs. So far, Unicode has provided the international characters, but not necessarily all the needed character-cell based forms-drawing capabilities. Some terminals have vertical and horizontal lines that are not centered within the character cell, and currently not found in Unicode. Others have black rectangles or other shapes not found in the U+2580 block. Abbreviations: V = Vertical H = Horizontal L = Left R = Right LL = Lower Left LR = Lower Right UL = Upper Left UR = Upper Right Terminology: Quadrant A black rectangle filling one quarter of a cell, with one corner in the center and the opposite corner at a corner of the cell. So "Quadrant UL" is the upper left quadrant; "Quadrant UL and UR" is the top half of the cell (which happens to be coincident with U+2580 and so is not included here). Line Refers to a line that extends all the way to opposite edge(s) of a cell, designed to be joined to (a) line(s) in the adjacent cell(s). Bar Refers to a horizontal line that does not touch any cell edges. Wedge Refers to a character cell with a diagonal line connecting opposite corners, dividing it into two triangles; one black, the other white. Thus an UL Wedge is similar to U+25E9, except it fills the entire character cell. Framus (Pick a better word!) is a shape composed of two triangles with their points meeting at the center of the cell to form an X with bars across the top and bottom, closing the open ends. A black framus has the two triangles filled in; a white one is in outline form. A framus with center bar has a horizontal line through the center of the cell. Figure 7.1: "Framus" Glyphs White Black With Bar ******* ******* ******* * * ***** * * * * *** * * * * ********* * * *** * * * * ***** * * ******* ******* ******* Table 7.1: Additional Line, Box, and Block Characters Code Description References E0D0 L V box line, extensible H19 07/12 (1) E0D1 R V box line, extensible H19 07/13 (1) E0D2 UL Wedge H19 07/02, IBM SF870000 E0D3 UR Wedge H19 05/14, IBM SF860000 E0D4 LL Wedge IBM SF850000 E0D5 LR Wedge IBM SF840000 E0D6 H line - Scan 1 DSG 06/15, H19 07/10, WG3 05/00, TVI 09/00 E0D7 H line - Scan 3 DSG 07/00, Wyse ANSI 01/01, WG3 05/00 H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2) E0D9 H line - Scan 7 DSG 07/02, Wyse ANSI 01/03, WG3 05/01 E0DA H line - Scan 9 DSG 07/03, H19 07/11, WG3 05/01, TVI 09/01 E0DB Quadrant LL H19 06/13, WG3 05/05, TVI 09/05 E0DC Quadrant LR H19 06/12, WG3 05/04, TVI 09/04 E0DD Quadrant UL H19 06/14, WG3 05/06, TVI 09/06 E0DE Quadrant UL and LL and LR WG3 05/11, TVI 09/11 E0DF Quadrant UL and LR H19 06/10 (3) E0E0 Quadrant UL and UR and LL WG3 05/12, TVI 09/12 E0E1 Quadrant UL and UR and LR WG3 05/13, TVI 09/13 E0E2 Quadrant UR H19 111, WG3 83, TVI 09/03 E0E3 Quadrant UR and LL (for completeness) E0E4 Quadrant UR and LL and LR WG3 05/14, TVI 09/14 E0E5 Full black diamond TVI 09/02 (4) E0E6 Black framus DGM 06/08 E0E7 Black framus + H center bar DGM 06/09 E0E8 White framus DGM 06/10 E0E9 White framus + H center bar DGM 06/11 E0EA R & L arrow to V center bar DGM 03/13 E0EB Up arrow to H center line DGL 02/12 E0EC R arrow to V center line DGL 02/13 E0ED L arrow to V center line DGL 02/14 E0EE Down arrow to H center line DGL 02/12 E0EF Box drawing double dash H DGL 03/12 (5) References: DGM = Data General Word-Processing, Greek, and Math Character Set [2] DGL = Data General Line Drawing Character Set [2] DSG = The DEC Special Graphics Character Set [5] H19 = The Heath/Zenith 19 Graphics Character Set [10] WG3 = The Wyse Graphics 3 Character Set [25] TVI = The Televideo 965 Multinational Character Set [23] IBM = Graphic Character Global Identifier (GCGID) [14] Wyse ANSI = Wyse 60 "Standard ANSI", "UK ANSI", and "ANSI Graphics" [25] Notes: (1) The vertical box lines are near, but not touching, the left and right edges of the cell, respectively, and are two pixels thick on the H19 screen. Similar to IBM GCID SF640000 and SF650000, respectively. (2) The center horizontal scan line is already in Unicode at U+2500. (3) Only on Zenith models, not original Heathkits. (4) Full black diamond, with points touching center of each cell wall. (5) Similar to U+2504 but double rather than triple. Also note that Quadrants UL+UR, UR+LR, LL+LR, UL+LL (half blocks) are already encoded at block U+2580. Summary: 31 New glyphs, Range E0D0 to E0EF, one vacancy. 8. MISCELLANEOUS SINGLE-CELL GLYPHS Table 8.1: Miscellaneous Single-Cell Terminal Glyphs Code Description Reference E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1) E0F1 Box with X inside DG Math 06/07, GCGID SP500000 E0F2 Human stick figure with hat SNI Facet 04/14 E0F3 Clock (with hands at 3:00) SNI Klammern 05/01 E0F4 Overscore asterisk IBM 3270 E0F5 Overscore semicolon IBM 3270 E0F6 Padlock (keyboard locked) IBM 3270 Notes: (1) The reverse question is essential in VT terminal emulation, where it indicates that an invalid code was received, or a parity or other error was detected. It also stands for SUB and/or RS in Wyse display controls mode, and is the glyph for 0xFF in the Televideo Multinational Character Set [23]. And it it is also a glyph in the DG Special Graphics Character Set [2]. Summary: 7 New glyphs, Range E0F0 to E0FF, 9 vacant. 9. UNFINISHED BUSINESS The selection of characters presented in this draft is far from comprehensive. Hundreds of other terminals from the past 30+ years are likely to have glyphs or entire character sets covered neither here nor in Unicode, and these might or might not be important in some application somewhere. Readers are invited, therefore, to propose any needed additions, bearing in mind that Unicode code space is not unlimited. No attempt was made to account for the many Viewdata, Videotex, Minitel, NAPLPS, or other mosaic graphics character sets. These should be tackled, if appropriate, by someone who knows something about them. Several character sets found in the references consulted are ignored here, fully or in part, due to lack of motivation (nobody has ever asked us to support them). Obviously these, and any other missing sets, can be considered if there is a demand. Siemens Nixdorf Facet A set of 95 mosaic graphics, but not resembling any of the ISO Videotex mosaic sets; difficult to describe. Siemens Nixdorf Klammern A set of 95 assorted blobs, bracket and brace pieces, clocks, arrows, hourglasses, and Greek letters, some of which are unique; others can be unified with existing Unicode characters or characters in this proposal. Hewlett Packard Line Drawing Mostly coincident with Unicode box-drawing set at U+2500, but with a handful of unique characters, such as single-to-triple box intersections, single-to-double intersections with wide spacing, etc. These should be mappable to existing U+25xx glyphs without causing riots in the streets. Hewlett Packard Big Character Pieces Thick line segments for drawing large characters, used on the HP-2648. And no doubt many more... 10. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS If all the proposed new characters are added to the UCS, this will enable terminal emulators to fully handle at least the following terminal character sets, which were not previously covered in full: ASCII/ISO Display Controls for DEC, Hewlett Packard, Televideo, and others. EBCDIC Display Controls for the IBM 3270 Hexadecimal debugging DEC Technical DEC Special Graphics Data General Word-Processing, Greek, and Math (1) Data General Line Drawing Heath/Zenith 19 Graphics Hewlett Packard 2621 and HPTERM Siemens Nixdorf's "IBM" set (plus parts of its Klammern and Facet sets) Televideo Multinational Wyse Graphics 3 (Graphics 1 and 2 were already covered) Wyse "Standard ANSI", "UK ANSI", and "ANSI Graphics" (1) Except the DG logo character, which is presumed off limits. Terminals supporting these character sets are numerous indeed. An incomplete list includes: DEC VT100, VT102, VT220/240, VT320/330/340, VT420, VT520/525; Data General 210, 215, 217, 413, and 463; the Heath / Zenith 19; and numerous Televideo and Wyse models. Table 10.1 lists the new characters proposed in this document. Table 10.1: Census of New Characters Code Glyph Descripton E000 NUL Diagonal Control Picture Null E001 SOH Diagonal Control Picture Start of Heading E002 STX Diagonal Control Picture Start of Text E003 ETX Diagonal Control Picture End of Text E004 EOT Diagonal Control Picture End of Transmission E005 ENQ Diagonal Control Picture Enquiry E006 ACK Diagonal Control Picture Acknowledge E007 BEL Diagonal Control Picture Bell E009 BS Diagonal Control Picture Backspace E009 HT Diagonal Control Picture Horizontal Tab E00A LF Diagonal Control Picture Line Feed E00B VT Diagonal Control Picture Vertical Tab E00C FF Diagonal Control Picture Form Feed E00D CR Diagonal Control Picture Carriage Return E00E SO Diagonal Control Picture Shift Out E00F SI Diagonal Control Picture Shift In E010 DLE Diagonal Control Picture Data Link Escape E011 DC1 Diagonal Control Picture Device Control 1 E012 DC2 Diagonal Control Picture Device Control 2 E013 DC3 Diagonal Control Picture Device Control 3 E014 DC4 Diagonal Control Picture Device Control 4 E015 NAK Diagonal Control Picture Negative Acknowledge E016 SYN Diagonal Control Picture Synchronous Idle E017 ETB Diagonal Control Picture End of Transmission Block E018 CAN Diagonal Control Picture Cancel E019 EM Diagonal Control Picture End of Medium E01A SUB Diagonal Control Picture Substitute E01B ESC Diagonal Control Picture Escape E01C FS Diagonal Control Picture Field Separator E01D GS Diagonal Control Picture Group Separator E01E RS Diagonal Control Picture Record Separator E01F US Diagonal Control Picture Unit Separator E020 (vacant) E021 (vacant) E022 BPH Diagonal Control Picture Break Permitted Here E023 NBH Diagonal Control Picture No Break Here E024 IND Diagonal Control Picture Index E025 NEL Diagonal Control Picture Next Line E026 SSA Diagonal Control Picture Start Selected Area E027 ESA Diagonal Control Picture End Selected Area E028 HTS Diagonal Control Picture Character Tabulation Set E029 HTJ Diagonal Control Picture Character Tabulation with Justification E02A VTS Diagonal Control Picture Line Tabulation Set E02B PLD Diagonal Control Picture Partial Line Forward E02C PLU Diagonal Control Picture Partial Line Backward E02D RI Diagonal Control Picture Reverse Line Feed E02E SS2 Diagonal Control Picture Single Shift 2 E02F SS3 Diagonal Control Picture Single Shift 3 E030 DCS Diagonal Control Picture Device Control String E031 PU1 Diagonal Control Picture Private Use 1 E032 PU2 Diagonal Control Picture Private Use 2 E033 STS Diagonal Control Picture Set Transmit State E034 CCH Diagonal Control Picture Cancel Character E035 MW Diagonal Control Picture Message Waiting E036 SPA Diagonal Control Picture Start Protected (Guarded) Area E037 EPA Diagonal Control Picture End Protected (Guarded) Area E038 SOS Diagonal Control Picture Start of String E039 (vacant) E03A SCI Diagonal Control Picture Single Character Introducer E03B CSI Diagonal Control Picture Control Sequence Introducer E03C ST Diagonal Control Picture String Terminator E03D OSC Diagonal Control Picture Operating System Command E03E PM Diagonal Control Picture Privacy Message E03F APC Diagonal Control Picture Application Program Command E040 PF Diagonal Control Picture Punch Off E041 PN Diagonal Control Picture Punch On E042 LC Diagonal Control Picture Lower Case E043 UC Diagonal Control Picture Upper Case E044 SMM Diagonal Control Picture Start of Manual Message E045 TM Diagonal Control Picture Tape Mark E046 RES Diagonal Control Picture Restore E047 IL Diagonal Control Picture Idle E048 CC Diagonal Control Picture Cursor Control E049 CU1 Diagonal Control Picture Customer Use 1 E04A CU2 Diagonal Control Picture Customer Use 2 E04B CU3 Diagonal Control Picture Customer Use 3 E04C CU4 Diagonal Control Picture Customer Use 4 E04D IFS Diagonal Control Picture Interchange File Separator E04E IGS Diagonal Control Picture Interchange Group Separator E04F IUS Diagonal Control Picture Interchange Unit Separator E050 DS Diagonal Control Picture Digit Select E051 SOS Diagonal Control Picture Start of Significance E051 BYP Diagonal Control Picture Bypass E053 SM Diagonal Control Picture Set Mode E054 (vacant through E05F) E060 VCS Vertical Channel Select E061 GE Graphics Escape E062 ENP Enable Presentation E063 IRS Interchange Record Separator E064 INP Inhibit Presentation E065 SA Set Attribute E066 FMT Format E067 TRN Transparent E068 SF Start Field E069 SFE Start Field Extended E06A SBA Set Buffer Address E06B MF Modify Field E06C PT Program Tab E06D RA Repeat to Address E06E EUA Erase to Unprotected Address E06F DUP Duplicate E070 FM Field Mark E071 EO Eight Ones E072 (vacant through E07F) E080 SP Diagonal Control Picture Space E081 DEL Diagonal Control Picture Delete E082 LS1 Diagonal Control Picture Locking Shift 1 E083 LS0 Diagonal Control Picture Locking Shift 0 E084 IS4 Diagonal Control Picture Information Separator 4 E085 IS3 Diagonal Control Picture Information Separator 3 E086 IS2 Diagonal Control Picture Information Separator 2 E087 IS1 Diagonal Control Picture Information Separator 1 E088 CL Diagonal Control Picture Cancel Line E089 BP Diagonal Control Picture DG Word Processing BP E08A BE Diagonal Control Picture DG Word Processing BE E08B FN Diagonal Control Picture DG Word Processing FN E08C FE Diagonal Control Picture DG Word Processing FE E08D HF Diagonal Control Picture DG Word Processing HF E08E Diagonal crosshatches E08F Picture of bell E090 (vacant through E09F) E0A0 Extensible left brace middle E0A1 Extensible left parenthesis bottom E0A2 Extensible left parenthesis top E0A3 Extensible left SB bottom E0A4 Extensible left SB top E0A5 Extensible right brace middle E0A6 Extensible UR or LL brace section E0A7 Extensible LR or UL brace section E0A8 Extensible right parenthesis bottom E0A9 Extensible right parenthesis top E0AA Extensible right SB bottom E0AB Extensible right SB top E0AC Summation symbol bottom E0AD Summation symbol top E0AE Right ceiling corner E0AF Right floor corner E0B0 Radical symbol, small E0B1 Radical symbol with stroke E0B2 Superscript Latin small letter i E0B3 Latin small letter a with underbar E0B4 Latin capital letter H with bar E0B5 Latin small letter h with bar E0B6 Latin capital letter L with dot E0B7 Latin small letter L with dot E0B8 Latin capital letter O with underbar E0B9 Latin small letter t with bar E0BA Latin small script letter t with bar E0BB Eng-like letter E0BC Eng-like letter, fatter E0BD Eng-like letter with vertical stroke E0BE Superscript almost-equal-to sign E0BF Superscript capital Greek letterSigma E0C0 Superscript infinity sign E0C1 Superscript proportional-to sign E0C2 (vacant through E0CF) E0D0 L V box line, extensible E0D1 R V box line, extensible E0D2 UL Wedge E0D3 UR Wedge E0D4 LL Wedge E0D5 LR Wedge E0D6 H line - Scan 1 E0D7 H line - Scan 3 E0D8 (vacant) E0D9 H line - Scan 7 E0DA H line - Scan 9 E0DB Quadrant LL E0DC Quadrant LR E0DD Quadrant UL E0DE Quadrant UL and LL and LR E0DF Quadrant UL and LR E0E0 Quadrant UL and UR and LL E0E1 Quadrant UL and UR and LR E0E2 Quadrant UR E0E3 Quadrant UR and LL E0E4 Quadrant UR and LL and LR E0E5 Full black diamond E0E6 Black framus E0E7 Black framus + H center bar E0E8 White framus E0E9 White framus + H center bar E0EA R & L arrow to V center bar E0EB Up arrow to H center line E0EC R arrow to V center line E0ED L arrow to V center line E0EE Down arrow to H center line E0EF Box drawing double dash H E0F0 Reverse Question Mark E0F1 Box with X inside E0F2 Human stick figure with hat E0F3 Clock at 3:00 E0F4 Overscore asterisk E0F5 Overscore semicolon E0F6 Padlock E0F7 (vacant through E0FF) E100 (through E1FF): Hex Bytes Requested range: E000 through E1FF = 512 positions, 42 vacant. 11. REFERENCES [1] American National Standards Institute, ANSI X3.4-1986, Code for Information Interchange (ASCII), 1986. [2] Data General, Programming the Display Terminal: Models D217, D413, and D463, Westboro, MA, 1991. [3] Digital Equipment Corporation, VT100 User Guide, EK-VT100-UG-002, Maynard, MA, 1979. [4] Digital Equipment Corporation, VT100 Video Terminal User Guide, EK-VT102-UG-003, Maynard, MA, 1982. [5] Digital Equipment Corporation, VT220 Owner's Manual, EK-VT220-UG-003, Maynard, MA, 1984. [6] Digital Equipment Corporation, VT220 Series Programmer Reference Manual, EK-VT240-RM-002, Maynard, MA, 1984. [7] Digital Equipment Corporation, VT330/VT340 Programmer Reference Manual, Volume 1: Text Programming, ED-VT3XX-TP-002, Maynard, MA, 1988. [8] Digital Equipment Corporation, Installing and Using the VT420 Video Terminal EK-VT420-UG.002, Maynard, MA, 1988. [9] Digital Equipment Corporation, VT520/VT525 Video Terminal Programmer Inforamtion, EK-VT520-RM.A01, Maynard, MA, 1994. [10] Heathkit Manual for the Video Terminal Model H19, The Heath Company, Benton Harbor, MI, 1979. [11] Hewlett Packard 2621A/P Interactive Terminal Owner's Manual, 1978. [12] Hewlett Packard 2648A Graphics Terminal Reference Manual, 1977. [13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie, NY, 1970. [14] IBM National Language Design Guide, Volume 2: National Language Support Reference Manual, 4th Edition, North York, ON, 1994. [15] IBM 3270 Information Display System, Data Stream Programmer's Reference, GA23-0059-06, 1991. [16] IBM 3164 ASCII Color Display Station Description, GA18-2317-1, 1986. [17] ISO International Standard 2022, Information processing -- ISO 7-bit and 8-bit coded character sets -- Code extension techniques, Third Edition, Geneva, 1986. [18] ISO/IEC International Standard 6429, Information technology -- Control functions for coded character sets, Third Edition, Geneva, 1992. [19] ISO/IEC 10646-1, International Standard 10646, Information Processing -- Multiple-Octet Coded Character Set, 1993-now. [20] Perkin Elmer Model 1100 User's Manual, Randolph, NJ, 1978. [21] Siemens Nixdorf, Bildschirmeinheit 97801-5xx Schnittstellen, Benutzerhandbuch, München, 1991. [22] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA, 1984. [23] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA, 1988. [24] The Unicode Standard, Version 2.0, Addison-Wesley Developers Press, 1996. [25] Wyse WY-60 Programmer's Guide, Wyse Technology, San Jose, CA, 1987. [26] Wyse WY-370 Programmer's Guide, Wyse Technology, San Jose, CA, 1990. (End)