2020ok Directory of FREE Online Books and FREE eBooks |
The Unicode Standard, Version 3.0by Unicode Consortium Download Book (Respecting the intellectual property of others is utmost important to us, we make every effort to make sure we only link to legitimate sites, such as those sites owned by authors and publishers. If you have any questions about these links, please contact us.) link 1 About Book Book Description The Unicode Standard, Version 3.0 is THE authoritative source of information on the Unicode character-encoding standard, which makes it possible to create global software and share data across languages, nations, and locales worldwide. Encompassing all of the world's widely-used scripts and character sets, Unicode represents the foundation for international software; it is already supported by Java TM, Windows NT/2000, NetWare, QuickDraw GX, and many other environments and applications. This authorized guide documents all essential elements of Unicode 3.0, including its basic principles, code charts, implementation techniques, and rules for conformance. It contains up-to-the-minute coverage of the latest scripts included in Unicode 3.0, as well as more than a decade's implementation experience from the world's leading experts in multilingual applications. Book Info Presents the authoritative source of information on the Unicode character encoding standard, the international character code for information processing including all major scripts of the world. From the Back Cover Unicode
The authoritative, technical guide to the creation of software for worldwide use. Detailed specifications for Unicode:
Expanded implementation guidelines by experts in global software design:
Comprehensive charts, references, glossary, and indexes:
CD-ROM The comprehensive Unicode Character Database for:
International, national, and vendor character mappings for:
Unicode Technical Reportsthat extend the standard for:
0201616335B04062001 About the Author The Unicode Consortium is a non-profit organization founded to develop, extend, and promote the use of the Unicode Standard. The membership of the Consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. The Unicode Consortium actively cooperates with many of the leading standards development organizations, including ISO/IEC JTC1, W3C, IETF, and ECMA. 0201616335AB07232003 Excerpt. © Reprinted by permission. All rights reserved. This book, The Unicode Standard, Version 3.0, is the authoritative source of information on the Unicode character encoding standard, the international character code for information processing that includes all major scripts of the world and is the foundation for development of software for worldwide use. As well as encoding characters used for written communication in a simple and consistent manner, the Unicode Standard defines character properties and algorithms for use in implementations. Version 3.0 expands on material from Versions 2.0 and 2.1 and supersedes all other previous versions. The previous versions of the Unicode Standard are:
0.1 About the Unicode Standard This book defines Version 3.0 of the Unicode Standard. The general principles and architecture of the Unicode Standard, requirements for conformance, and guidelines for implementers precede the actual coding information. Useful ancillary information is given in the appendices. The accompanying CD-ROM contains tables of use to implementers and all technical reports published to date. Concepts, Architecture, Conformance, and Guidelines The first five chapters of Version 3.0 introduce the Unicode Standard and provide the information an engineer needs to produce a conforming implementation. Basic text processing, working with combining marks, encoding forms, and doing bidirectional text layout are all described. A special chapter on implementation guidelines answers many common questions that arise when implementing Unicode.
Chapters 6 through 13 contain the character block descriptions that give basic information about each script or collection and may discuss specific characters or pertinent layout information.
The next two chapters document the Unicode Standard's character code assignments, their names and important descriptive information, and Han indices that aid in locating specific ideographs encoded in Unicode.
The appendices contain detailed background information on important topics: character encoding systems, submission of proposals, and the history of Unicode and its relationship to ISO/IEC 10646.
The Unicode Character Database and Technical Reports The Unicode Character Database is the name for a collection of files that contain character code values, character names, and character property data. It is described more fully in the file UnicodeCharacterDatabase.html. Version 3.0.0 of the database is provided on the accompanying CD-ROM. Updates and revisions will be made available online. See http://www.unicode.org/unicode/standard/versions/ for information on the latest available version. The following Unicode Technical Reports are formally part of this standard:
On the CD-ROM The CD-ROM contains the Unicode Character Database, which gives character codes, character names, character properties, and decompositions for decomposable or compatibility characters. In addition to the Unicode Character Database and Unicode Technical Reports that are part of this standard, the CD-ROM also contains additional technical reports (covering topics such as compression, collation, and transformation formats), as well as property-based mapping tables (for example, tables for case) and transcoding tables for international, national, and industry character sets (including the Han cross-reference table). For the complete contents of the CD-ROM, see its READ ME file. Please consult the Unicode Consortium's online resources (see Section 0.3, Resources) to obtain the most up-to-date versions of the materials on the CD-ROM.
0.2 Notational Conventions Throughout this book, certain typographic conventions are used. In running text, an individual Unicode value is expressed as U+nnnn, where nnnn is a four-digit number in hexadecimal notation, using the digits 0-9 and the letters A-F (for 10 through 15, respectively).
A range of Unicode values is expressed as U+xxxxAEU+yyyy, or U+xxxx--U+yyyy, or xxxx..yyyy, where xxxx and yyyy are the first and last Unicode values in the range, and the arrow, long dash, or two dots indicate a contiguous range inclusive of the endpoints.
In running text, a formal Unicode name is shown in small capitals (for example, GREEK SMALL LETTER MU), and alternative names (aliases) appear in italics (for example, umlaut). Italics are also used to refer to a text element that is not explicitly encoded (for example, pasekh alef) or to set off a foreign word (for example, the Welsh word ynghyd). Phonemic transcriptions are shown between slashes, as in Khmer /khnyom/. The symbols used in the character names list are described at the beginning of Chapter 14, Code Charts. In the text of this book, the word "Unicode" when used alone as a noun refers to the Unicode Standard. In this book, unambiguous dates of the current common era, such as 1999, are unlabeled. In cases of ambiguity, CE is used. Dates before the common era are labeled with BCE. Extended BNF The Unicode Standard and technical reports use an extended BNF format for describing syntax. As different conventions are used for BNF, Table 0-1, Extended BNF, lists the notation used here.
A sequence of characters is sometimes listed in text with angle brackets, such as <a, grave> or <U+0061, U+0300>. Table 0-1. Extended BNF
Character Classes. A character class is constructed from one or two base sets. It is either a single base set, the negation of a base set, or the (set) difference between two base sets. The base sets themselves are bounded by brackets, and contain lists of characters, ranges of characters, general categories, or negations of general categories. The syntax follows: charClass := baseSet '¬' baseSet baseSet '-' baseSet baseSet := '' item (','? item)* '' item := char char '-' char '{' '¬'? category '}'
General categories are defined in Chapter 4, Character Properties, such as {Uppercase Letter} for uppercase letter. Main categories such as {Mark} are the equivalent of a list of multiple subcategories: {Non-Spacing Mark}{Spacing Combining Mark}{Enclosing Mark}. Examples are found in Table 0-2, Character Class Examples. Table 0-2. Character Class Examples
Operators Table 0-3. Operators
0.3 Resources Unicode Web Site Unicode Anonymous FTP Site Unicode Public Mailing List
0201616335P04062001 Related Free eBooks
| Related Tags |
Comments
SEND A COMMENT
PLEASE READ: All comments must be approved before appearing in the thread; time and space constraints prevent all comments from appearing. We will only approve comments that are directly related to the article, use appropriate language and are not attacking the comments of others.