Sunday 5 April 2020

CHARACTER SET


A character that appears on your computer screen, whether it's a number, a letter, or a symbol, is the graphical interpretation of a number. For a computer to know what characters to display, it refers to a database that associates a single character with each number. This database is called a character set.

Not all computers agree about what number applies to what character, so two users with incompatible character sets will have difficulty sharing information. For plain text messages written in English, this rarely happens, but for more complex documents such as those using extended characters, especially non-European characters, it is a major concern.
When you write a program, you express C++ source files as text lines containing characters from the source character set. When a program executes in the target environment, it uses characters from the target character set. These character sets are related, but need not have the same encoding or all the same members.
Every character set contains a distinct code value for each character in the basic C++ character set. A character set can also contain additional characters with other code values. For example:
  • The character constant 'x' becomes the value of the code for the character corresponding to x in the target character set.
  • The string literal "xyz" becomes a sequence of character constants stored in successive bytes of memory, followed by a byte containing the value zero: {'x', 'y', 'z', '\0'}
A string literal is one way to specify a null-terminated string, an array of zero or more bytes followed by a byte containing the value zero.
Visible graphic characters in the basic C character set:
Form         Members
letter       A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
                 a b c d e f g h i j k l m n o p q r s t u v w x y z
 
digit        0 1 2 3 4 5 6 7 8 9
 
underscore   _
 
punctuation  ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ { | } ~

Additional graphic characters in the basic C++ character set:
Character              Meaning
space                  leave blank space
BEL                   signal an alert (BELl)
BS                     go back one position (BackSpace)
FF                     go to top of page (Form Feed)
NL                    go to start of next line (NewLine)
CR                    go to start of this line (Carriage Return)
HT                    go to next Horizontal Tab stop
VT                    go to next Vertical Tab stop


The code value zero is reserved for the null character which is always in the target character set. Code values for the basic C++ character set are positive when stored in an object of type char. Code values for the digits are contiguous, with increasing value. For example, '0' + 5 equals '5'. Code values for any two letters are not necessarily contiguous.

No comments:

Post a Comment