Appendix B: Reference Tables
This appendix contains several tables that will be useful when negotiating HTTP content. Covered in this appendix are:
Media Types
Whenever an entity-body is sent via HTTP, a media type must be sent using the Content-type header. Also, web clients can use the Accept header to define which media types the client can handle.
Character Encoding
In URL-encoded data (as described in Chapter 3, Learning HTTP), any “special” characters such as spaces and punctuation must be encoded with a % escape sequence.
Languages
Entity-bodies can be sent with a Content-language header, to declare what language the entity is written in. Clients can declare which languages they can handle, using the Accept-language header.
Character Sets
Clients can use the Accept-charset header to declare which character sets they are capable of handling.
Media Types
Listed below are media types that are registered with the Internet Assigned Number Authority (IANA). According to the HTTP specification, use of nonregistered media types is discouraged.
The IANA media list is available in RFC 1700. A more readable document describing the assigned media types is available at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.
A variety of methods is used to identify the media type of a document. The easiest method, but the least accurate, is to map well-known file extensions with a media type. For example, a file that ends in “.GIF” would map to “image/gif”. However, in usual practice, there is no verification that the file is in fact a GIF file.
A more accurate method would examine the structure or data format of the file and map it to a media type. For some media types, magic numbers allow this to happen. For example, all GIF files begin with the three uppercase letters of GIF, and all JPEG files begin with 0xFFD8 (hexadecimal notation). This method, however, is more time consuming.
Under some filesystems, media types may be mapped by examining the file type/creator attribute of the file. While this is easily achieved under MacOS's HFS, other filesystems (DOS, NTFS, BSD) do not have these file attributes.
Type | Subtype |
text | plain |
text | richtext |
text | enriched |
text | tab-separated-values |
text | html |
text | sgml |
multipart | mixed |
multipart | alternative |
multipart | digest |
multipart | parallel |
multipart | appledouble |
multipart | header-set |
multipart | form-data |
multipart | related |
multipart | report |
multipart | voice-message |
message | rfc822 |
message | partial |
message | external-body |
message | news |
message | http |
application | octet-stream |
application | postscript |
application | oda |
application | atomicmail |
application | andrew-inset |
application | slate |
application | wita |
application | dec-dx |
application | dca-rft |
application | activemessage |
application | rtf |
application | applefile |
application | mac-binhex40 |
application | news-message-id |
application | news-transmission |
application | wordperfect5.1 |
application | |
application | zip |
application | macwriteii |
application | msword |
application | remote-printing |
application | mathematica |
application | cybercash |
application | commonground |
application | iges |
application | riscos |
application | eshop |
application | x400-bp |
application | sgml |
application | cals-1840 |
application | vnd.framemaker |
application | vnd.mif |
application | vnd.ms-excel |
application | vnd.ms-powerpoint |
application | vnd.ms-project |
application | vnd.ms-works |
application | vnd.ms-tnef |
application | vnd.svd |
application | vnd.music-niff |
application | vnd.ms-artgalry |
application | vnd.truedoc |
application | vnd.koan |
image | jpeg |
image | gif |
image | ief |
image | g3fax |
image | tiff |
image | cgm |
image | naplps |
image | vnd.dwg |
image | vnd.svf |
image | vnd.dxf |
audio | basic |
audio | 32kadpcm |
video | mpeg |
video | quicktime |
video | vnd.vivo |
Character Encoding
When the client sends data to a CGI program using the Content-type of application/x-www-form-urlencoded, certain special characters are encoded to eliminate ambiguity. Table B-2 shows which characters are transformed and which are not transformed. For more information on URLs, see RFC 1738.
Languages
A language tag is of the form of:
<primary-tag> <-subtag>
where zero or more subtags are allowed. The primary-tag specifies the language, and the subtag specifies parameters to the language, like dialect information, country identification, or script variations. RFC 1766 contains the complete documentation of languages and parameter usage. The key values for the primary-tag and subtag are outlined in Tables B-3 and B-4, respectively.
Examples:
de
(German)
en
(English)
en-us
(English, USA)
Table B-3 lists the primary langauge tags as defined in ISO 639 and RFC 1766.
Primary Tag | Language |
aa | Afar |
ab | Abkhazian |
af | Afrikaans |
am | Amharic |
ar | Arabic |
as | Assamese |
ay | Aymara |
az | Azerbaijani |
ba | Bashkir |
be | Byelorussian |
bg | Bulgarian |
bh | Bihari |
bi | Bislama |
bn | Bengali; Bangla |
bo | Tibetan |
br | Breton |
ca | Catalan |
co | Corsican |
cs | Czech |
cy | Welsh |
da | Danish |
de | German |
dz | Bhutani |
el | Greek |
en | English |
eo | Esperanto |
es | Spanish |
et | Estonian |
eu | Basque |
fa | Persian |
fi | Finnish |
fj | Fiji |
fo | Faeroese |
fr | French |
fy | Frisian |
ga | Irish |
gd | Scots, Gaelic |
gl | Galician |
gn | Guarani |
gu | Gujarati |
ha | Hausa |
he | Hebrew |
hi | Hindi |
hr | Croatian |
hu | Hungarian |
hy | Armenian |
ia | Interlingua |
id | Indonesian |
ie | Interlingue |
ik | Inupiak |
is | Icelandic |
it | Italian |
iu | Inuktitat |
iw | Hebrew |
ja | Japanese |
jw | Javanese |
ka | Georgian |
kk | Kazakh |
kl | Greenlandic |
km | Cambodian |
kn | Kannada |
ko | Korean |
ks | Kashmiri |
ku | Kurdish |
ky | Kirghiz |
la | Latin |
ln | Lingala |
lo | Laothian |
lt | Lithuanian |
lv | Latvian, Lettish |
mg | Malagasy |
mi | Maori |
mk | Macedonian |
ml | Malayalam |
mn | Mongolian |
mo | Moldavian |
mr | Marathi |
ms | Malay |
mt | Maltese |
my | Burmese |
na | Nauru |
ne | Nepali |
nl | Dutch |
no | Norwegian |
oc | Occitan |
om | (Afan) Oromo |
or | Oriya |
pa | Punjabi |
pl | Polish |
ps | Pashto, Pushto |
pt | Portuguese |
qu | Quechua |
rm | Rhaeto-Romance |
rn | Kirundi |
ro | Romanian |
ru | Russian |
rw | Kinyarwanda |
sa | Sanskrit |
sd | Sindhi |
sg | Sangro |
sh | Serbo-Croatian |
si | Singhalese |
sk | Slovak |
sl | Slovenian |
sm | Samoan |
sn | Shona |
so | Somali |
sq | Albanian |
sr | Serbian |
ss | Siswati |
st | Sesotho |
su | Sudanese |
sv | Swedish |
sw | Swahili |
ta | Tamil |
te | Tegulu |
tg | Tajik |
th | Thai |
ti | Tigrinya |
tk | Turkmen |
tl | Tagalog |
tn | Setswana |
to | Tonga |
tr | Turkish |
ts | Tsonga |
tt | Tatar |
tw | Twi |
ug | Uigar |
uk | Ukrainian |
ur | Urdu |
uz | Uzbek |
vi | Vietnamese |
vo | Volapuk |
wo | Wolof |
xh | Xhosa |
yi | Yiddish |
yo | Yoruba |
za | Zhuang |
zh | Chinese |
zu | Zulu |
Table B-4 lists the language subtypes as defined in ISO 3166.
Subtype | Country |
AD | Andorra |
AE | United Arab Emirates |
AF | Afghanistan |
AG | Antigua and Barbuda |
AI | Anguilla |
AL | Albania |
AM | Armenia |
AN | Netherland Antilles |
AO | Angola |
AQ | Antarctica |
AR | Argentina |
AS | American Samoa |
AT | Austria |
AU | Australia |
AW | Aruba |
AZ | Azerbaidjan |
BA | Bosnia-Herzegovina |
BB | Barbados |
BD | Bangladesh |
BE | Belgium |
BF | Burkina Faso |
BG | Bulgaria |
BH | Bahrain |
BI | Burundi |
BJ | Benin |
BM | Bermuda |
BN | Brunei Darussalam |
BO | Bolivia |
BR | Brazil |
BS | Bahamas |
BT | Buthan |
BV | Bouvet Island |
BW | Botswana |
BY | Belarus |
BZ | Belize |
CA | Canada |
CC | Cocos (Keeling) Isl. |
CF | Central African Rep. |
CG | Congo |
CH | Switzerland |
CI | Ivory Coast |
CK | Cook Islands |
CL | Chile |
CM | Cameroon |
CN | China |
CO | Colombia |
CR | Costa Rica |
CS | Czechoslovakia |
CU | Cuba |
CV | Cape Verde |
CX | Christmas Island |
CY | Cyprus |
CZ | Czech Republic |
DE | Germany |
DJ | Djibouti |
DK | Denmark |
DM | Dominica |
DO | Dominican Republic |
DZ | Algeria |
EC | Ecuador |
EE | Estonia |
EG | Egypt |
EH | Western Sahara |
ES | Spain |
ET | Ethiopia |
FI | Finland |
FJ | Fiji |
FK | Falkland Isl. (Malvinas) |
FM | Micronesia |
FO | Faroe Islands |
FR | France |
FX | France (European Ter.) |
GA | Gabon |
GB | Great Britain (UK) |
GD | Grenada |
GE | Georgia |
GH | Ghana |
GI | Gibraltar |
GL | Greenland |
GP | Guadeloupe (Fr.) |
GQ | Equatorial Guinea |
GF | Guyana (Fr.) |
GM | Gambia |
GN | Guinea |
GR | Greece |
GT | Guatemala |
GU | Guam (US) |
GW | Guinea Bissau |
GY | Guyana |
HK | Hong Kong |
HM | Heard & McDonald Isl. |
HN | Honduras |
HR | Croatia |
HT | Haiti |
HU | Hungary |
ID | Indonesia |
IE | Ireland |
IL | Israel |
IN | India |
IO | British Indian O. Terr. |
IQ | Iraq |
IR | Iran |
IS | Iceland |
IT | Italy |
JM | Jamaica |
JO | Jordan |
JP | Japan |
KE | Kenya |
KG | Kirgistan |
KH | Cambodia |
KI | Kiribati |
KM | Comoros |
KN | St. Kitts Nevis Anguilla |
KP | Korea (North) |
KR | Korea (South) |
KW | Kuwait |
KY | Cayman Islands |
KZ | Kazachstan |
LA | Laos |
LB | Lebanon |
LC | Saint Lucia |
LI | Liechtenstein |
LK | Sri Lanka |
LR | Liberia |
LS | Lesotho |
LT | Lithuania |
LU | Luxembourg |
LV | Latvia |
LY | Libya |
MA | Morocco |
MC | Monaco |
MD | Moldavia |
MG | Madagascar |
MH | Marshall Islands |
ML | Mali |
MM | Myanmar |
MN | Mongolia |
MO | Macau |
MP | Northern Mariana Isl. |
MQ | Martinique (Fr.) |
MR | Mauritania |
MS | Montserrat |
MT | Malta |
MU | Mauritius |
MV | Maldives |
MW | Malawi |
MX | Mexico |
MY | Malaysia |
MZ | Mozambique |
NA | Namibia |
NC | New Caledonia (Fr.) |
NE | Niger |
NF | Norfolk Island |
NG | Nigeria |
NI | Nicaragua |
NL | Netherlands |
NO | Norway |
NP | Nepal |
NR | Nauru |
NT | Neutral Zone |
NU | Niue |
NZ | New Zealand |
OM | Oman |
PA | Panama |
PE | Peru |
PF | Polynesia (Fr.) |
PG | Papua New Guinea |
PH | Philippines |
PK | Pakistan |
PL | Poland |
PM | St. Pierre & Miquelon |
PN | Pitcairn |
PT | Portugal |
PR | Puerto Rico (US) |
PW | Palau |
PY | Paraguay |
QA | Qatar |
RE | Reunion (Fr.) |
RO | Romania |
RU | Russian Federation |
RW | Rwanda |
SA | Saudi Arabia |
SB | Solomon Islands |
SC | Seychelles |
SD | Sudan |
SE | Sweden |
SG | Singapore |
SH | St. Helena |
SI | Slovenia |
SJ | Svalbard & Jan Mayen Isl. |
SK | Slovak Republic |
SL | Sierra Leone |
SM | San Marino |
SN | Senegal |
SO | Somalia |
SR | Suriname |
ST | St. Tome and Principe |
SU | Soviet Union |
SV | El Salvador |
SY | Syria |
SZ | Swaziland |
TC | Turks & Caicos Islands |
TD | Chad |
TF | French Southern Terr. |
TG | Togo |
TH | Thailand |
TJ | Tadjikistan |
TK | Tokelau |
TM | Turkmenistan |
TN | Tunisia |
TO | Tonga |
TP | East Timor |
TR | Turkey |
TT | Trinidad & Tobago |
TV | Tuvalu |
TW | Taiwan |
TZ | Tanzania |
UA | Ukraine |
UG | Uganda |
UK | United Kingdom |
UM | US Minor Outlying Isl. |
US | United States |
UY | Uruguay |
UZ | Uzbekistan |
VA | Vatican City State |
VC | St.Vincent & Grenadines |
VE | Venezuela |
VG | Virgin Islands (British) |
VI | Virgin Islands (US) |
VN | Vietnam |
VU | Vanuatu |
WF | Wallis & Futuna Islands |
WS | Samoa |
YE | Yemen |
YU | Yugoslavia |
ZA | South |
ZM | Zambia |
ZR | Zaire |
ZW | Zimbabwe |
Character Sets
Table B-5 lists the character sets that may be used with the Accept-language and Content-language HTTP headers. This list does not describe all of the possible character sets of international languages that can appear in the headers. For a comprehensive list of character sets, their aliases, and pointers to more descriptive documents, refer to RFC 1700.
Get Web Client Programming with Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.