LCSC > Natural Science and Mathematics > Computer Science > Patterson-McNeill > Foundations > Notes


CS111
Notes
Spring 2009
  1. Number Systems
  2. Orders of Magnitude
  3. Integers
  4. Floating Point
  5. ASCII

The ASCII Character Set

We have now seen how numbers are represented in the computer. We use two's complement for integers and floating point for reals.

A text document is broken into individual characters. There are a finite number of characters to represent. So the approach is to list them all and assign each a binary string. To store a particular letter, we store the appropriate bit string.

What characters do we worry about? Upper and lower case letters, punctuation, numeric digits, white space, special symbols. What symbols would we need if we were not using English?

A character set is a list of characters and the codes used to represent them. Several codes have been used over the years. IBM created a code for its mainframe computers. It is the Extended Binary Coded Decimal Information Code (EBCDIC). It is an 8-bit code. Then we got American Standard Code for Information Interchange (ASCII), used by PC's. It began as an 7-bit code with the extra bit used for a variety of purposes including checking that the code was transmitted correctly. But 7 bits allow only 128 characters. This is fine if everyone communicates in English, but a few other languages exist. The 7-bit code was extended to be an 8-bit code which allows 256 different characters. Now other languages can be included: Spanish, German, French and most other European languages. The third system in the Unicode Character Set which uses 16 bits per character. Now we can express over 65.000 characters. More languages can be represented, including Russian, Thai, Greek, Cherokee, Braille, Chinese/Japanese/Korean, Mathematical symbols, and special symbols such as ₤ and ™. In future versions even Klingon will be supported.

Below is the table of printable and nonprintable ASCII characters.

Nonprintable ASCII Characters Printable ASCII Characters
Decimal Hex Name Decimal Hex Name
00NULL 3220(Space)
101SOTT 3321!
202STX 3422"
303ETY 3523#
404EOT 3624$
505ENQ 3725%
606ACK 3826&
707BELL 3927'
808BKSPC 4028(
909HZTAB 4129)
100ANEWLN 422A*
110BVTAB 432B+
120CFF 442C,
130DCR 452D-
140ECO 462E.
150FSI 472F/
1610DLE 48300
1711DC1 49311
1812DC2 50322
1913DC3 51333
2014DC4 52344
2115NAK 53355
2216SYN 54366
2317ETB 55377
2418CAN 56388
2519EM 57399
261ASUB 583A:
271BESC 593B;
281CFS 603C>
291DGS 613D=
301ERS 623E<
311FUS 633F?
1277FDEL 6440@
6541A
6642B
6743C
6844D
6945E
7046F
7147G
7248H
7349I
744AJ
754BK
764CL
774DM
784EN
794FO
8050P
8151Q
8252R
8353S
8454T
8555U
8656V
8757W
8858X
8959Y
905AZ
915B[
925C\
935D]
945E^
955F_
9660`
9761a
9862b
9963c
10064d
10165e
10266f
10367g
10468h
10569i
1066Aj
1076Bk
1086Cl
1096Dm
1106En
1116Fo
11270p
11371q
11472r
11573s
11674t
11775u
11876v
11977w
12078x
12179y
1227Az
1237B{
1247C|
1257D}
1267E~
 

Computers can layer encodings to virtually any level of complexity. Numbers can be interpreted as characters, which can be interpreted in sets as Web pages, which can be interpreted to appear as multiple fonts and styles. But at the bottommost level, the computer only "knows" voltages, which we interpret as numbers.


Syllabus Notes
Revised - 8 January 2009