Search the Catalog
Web Client Programming with Perl

Web Client Programming with Perl

Automating Tasks on the Web

By Clinton Wong
1st Edition March 1997




This book is out of print, but it has been made available online through the O'Reilly Open Books Project.


Appendix B
Reference Tables

This appendix contains several tables that will be useful when negotiating HTTP content. Covered in this appendix are:

Media Types
Whenever an entity-body is sent via HTTP, a media type must be sent using the Content-type header. Also, web clients can use the Accept header to define which media types the client can handle.

Character Encoding
In URL-encoded data (as described in Chapter 3, Learning HTTP), any "special" characters such as spaces and punctuation must be encoded with a % escape sequence.

Languages
Entity-bodies can be sent with a Content-language header, to declare what language the entity is written in. Clients can declare which languages they can handle, using the Accept-language header.

Character Sets
Clients can use the Accept-charset header to declare which character sets they are capable of handling.

Media Types

Listed below are media types that are registered with the Internet Assigned Number Authority (IANA). According to the HTTP specification, use of nonregistered media types is discouraged.

The IANA media list is available in RFC 1700. A more readable document describing the assigned media types is available at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.

A variety of methods is used to identify the media type of a document. The easiest method, but the least accurate, is to map well-known file extensions with a media type. For example, a file that ends in ".GIF" would map to "image/gif". However, in usual practice, there is no verification that the file is in fact a GIF file.

A more accurate method would examine the structure or data format of the file and map it to a media type. For some media types, magic numbers allow this to happen. For example, all GIF files begin with the three uppercase letters of GIF, and all JPEG files begin with 0xFFD8 (hexadecimal notation). This method, however, is more time consuming.

Under some filesystems, media types may be mapped by examining the file type/creator attribute of the file. While this is easily achieved under MacOS's HFS, other filesystems (DOS, NTFS, BSD) do not have these file attributes.

Table B-1: Internet Media Types

Type

Subtype

text

plain

text

richtext

text

enriched

text

tab-separated-values

text

html

text

sgml

multipart

mixed

multipart

alternative

multipart

digest

multipart

parallel

multipart

appledouble

multipart

header-set

multipart

form-data

multipart

related

multipart

report

multipart

voice-message

message

rfc822

message

partial

message

external-body

message

news

message

http

application

octet-stream

application

postscript

application

oda

application

atomicmail

application

andrew-inset

application

slate

application

wita

application

dec-dx

application

dca-rft

application

activemessage

application

rtf

application

applefile

application

mac-binhex40

application

news-message-id

application

news-transmission

application

wordperfect5.1

application

pdf

application

zip

application

macwriteii

application

msword

application

remote-printing

application

mathematica

application

cybercash

application

commonground

application

iges

application

riscos

application

eshop

application

x400-bp

application

sgml

application

cals-1840

application

vnd.framemaker

application

vnd.mif

application

vnd.ms-excel

application

vnd.ms-powerpoint

application

vnd.ms-project

application

vnd.ms-works

application

vnd.ms-tnef

application

vnd.svd

application

vnd.music-niff

application

vnd.ms-artgalry

application

vnd.truedoc

application

vnd.koan

image

jpeg

image

gif

image

ief

image

g3fax

image

tiff

image

cgm

image

naplps

image

vnd.dwg

image

vnd.svf

image

vnd.dxf

audio

basic

audio

32kadpcm

video

mpeg

video

quicktime

video

vnd.vivo

Character Encoding

When the client sends data to a CGI program using the Content-type of application/x-www-form-urlencoded, certain special characters are encoded to eliminate ambiguity. Table B-2 shows which characters are transformed and which are not transformed. For more information on URLs, see RFC 1738.

Table B-2: Character Encoding

ASCII

Symbol

CGI representation

< 32

 

always encode with %xx where xx is the hexadecimal representation of the character

32

 

+ or %20

33

!

%21

34

"

%22

35

#

%23

36

$

%24

37

%

%25

38

&

%26

39

'

%27

40

(

%28

41

)

%29

42

*

*

43

+

%2B

44

,

%2C

45

-

-

46

.

.

47

/

%2F

48

0

0

49

1

1

50

2

2

51

3

3

52

4

4

53

5

5

54

6

6

55

7

7

56

8

8

57

9

9

58

:

%3A

59

;

%3B

60

<

%3C

61

=

%3D

62

>

%3E

63

?

%3F

64

@

%40

65

A

A

66

B

B

67

C

C

68

D

D

69

E

E

70

F

F

71

G

G

72

H

H

73

I

I

74

J

J

75

K

K

76

L

L

77

M

M

78

N

N

79

O

O

80

P

P

81

Q

Q

82

R

R

83

S

S

84

T

T

85

U

U

86

V

V

87

W

W

88

X

X

89

Y

Y

90

Z

Z

91

[

%5B

92

\

%5C

93

]

%5D

94

^

%5E

95

_

_

96

`

%60

97

a

a

98

b

b

99

c

c

100

d

d

101

e

e

102

f

f

103

g

g

104

h

h

105

i

i

106

j

j

107

k

k

108

l

l

109

m

m

110

n

n

111

o

o

112

p

p

113

q

q

114

r

r

115

s

s

116

t

t

117

u

u

118

v

v

119

w

w

120

x

x

121

y

y

122

z

z

123

{

%7B

124

|

%7C

125

}

%7D

126

~

%7E

127

 

%7F

> 127

 

always encode with %xx where xx is the hexadecimal representation of the character

Languages

A language tag is of the form of:

<primary-tag> <-subtag>

where zero or more subtags are allowed. The primary-tag specifies the language, and the subtag specifies parameters to the language, like dialect information, country identification, or script variations. RFC 1766 contains the complete documentation of languages and parameter usage. The key values for the primary-tag and subtag are outlined in Tables B-3 and B-4, respectively.

Examples:

de
(German)

en
(English)

en-us
(English, USA)

Table B-3 lists the primary langauge tags as defined in ISO 639 and RFC 1766.

Table B-3: Primary Language Types

Primary Tag

Language

aa

Afar

ab

Abkhazian

af

Afrikaans

am

Amharic

ar

Arabic

as

Assamese

ay

Aymara

az

Azerbaijani

ba

Bashkir

be

Byelorussian

bg

Bulgarian

bh

Bihari

bi

Bislama

bn

Bengali; Bangla

bo

Tibetan

br

Breton

ca

Catalan

co

Corsican

cs

Czech

cy

Welsh

da

Danish

de

German

dz

Bhutani

el

Greek

en

English

eo

Esperanto

es

Spanish

et

Estonian

eu

Basque

fa

Persian

fi

Finnish

fj

Fiji

fo

Faeroese

fr

French

fy

Frisian

ga

Irish

gd

Scots, Gaelic

gl

Galician

gn

Guarani

gu

Gujarati

ha

Hausa

he

Hebrew

hi

Hindi

hr

Croatian

hu

Hungarian

hy

Armenian

ia

Interlingua

id

Indonesian

ie

Interlingue

ik

Inupiak

is

Icelandic

it

Italian

iu

Inuktitat

iw

Hebrew

ja

Japanese

jw

Javanese

ka

Georgian

kk

Kazakh

kl

Greenlandic

km

Cambodian

kn

Kannada

ko

Korean

ks

Kashmiri

ku

Kurdish

ky

Kirghiz

la

Latin

ln

Lingala

lo

Laothian

lt

Lithuanian

lv

Latvian, Lettish

mg

Malagasy

mi

Maori

mk

Macedonian

ml

Malayalam

mn

Mongolian

mo

Moldavian

mr

Marathi

ms

Malay

mt

Maltese

my

Burmese

na

Nauru

ne

Nepali

nl

Dutch

no

Norwegian

oc

Occitan

om

(Afan) Oromo

or

Oriya

pa

Punjabi

pl

Polish

ps

Pashto, Pushto

pt

Portuguese

qu

Quechua

rm

Rhaeto-Romance

rn

Kirundi

ro

Romanian

ru

Russian

rw

Kinyarwanda

sa

Sanskrit

sd

Sindhi

sg

Sangro

sh

Serbo-Croatian

si

Singhalese

sk

Slovak

sl

Slovenian

sm

Samoan

sn

Shona

so

Somali

sq

Albanian

sr

Serbian

ss

Siswati

st

Sesotho

su

Sudanese

sv

Swedish

sw

Swahili

ta

Tamil

te

Tegulu

tg

Tajik

th

Thai

ti

Tigrinya

tk

Turkmen

tl

Tagalog

tn

Setswana

to

Tonga

tr

Turkish

ts

Tsonga

tt

Tatar

tw

Twi

ug

Uigar

uk

Ukrainian

ur

Urdu

uz

Uzbek

vi

Vietnamese

vo

Volapuk

wo

Wolof

xh

Xhosa

yi

Yiddish

yo

Yoruba

za

Zhuang

zh

Chinese

zu

Zulu

Table B-4 lists the language subtypes as defined in ISO 3166.

Table B-4: Language Subtypes

Subtype

Country

AD

Andorra

AE

United Arab Emirates

AF

Afghanistan

AG

Antigua and Barbuda

AI

Anguilla

AL

Albania

AM

Armenia

AN

Netherland Antilles

AO

Angola

AQ

Antarctica

AR

Argentina

AS

American Samoa

AT

Austria

AU

Australia

AW

Aruba

AZ

Azerbaidjan

BA

Bosnia-Herzegovina

BB

Barbados

BD

Bangladesh

BE

Belgium

BF

Burkina Faso

BG

Bulgaria

BH

Bahrain

BI

Burundi

BJ

Benin

BM

Bermuda

BN

Brunei Darussalam

BO

Bolivia

BR

Brazil

BS

Bahamas

BT

Buthan

BV

Bouvet Island

BW

Botswana

BY

Belarus

BZ

Belize

CA

Canada

CC

Cocos (Keeling) Isl.

CF

Central African Rep.

CG

Congo

CH

Switzerland

CI

Ivory Coast

CK

Cook Islands

CL

Chile

CM

Cameroon

CN

China

CO

Colombia

CR

Costa Rica

CS

Czechoslovakia

CU

Cuba

CV

Cape Verde

CX

Christmas Island

CY

Cyprus

CZ

Czech Republic

DE

Germany

DJ

Djibouti

DK

Denmark

DM

Dominica

DO

Dominican Republic

DZ

Algeria

EC

Ecuador

EE

Estonia

EG

Egypt

EH

Western Sahara

ES

Spain

ET

Ethiopia

FI

Finland

FJ

Fiji

FK

Falkland Isl. (Malvinas)

FM

Micronesia

FO

Faroe Islands

FR

France

FX

France (European Ter.)

GA

Gabon

GB

Great Britain (UK)

GD

Grenada

GE

Georgia

GH

Ghana

GI

Gibraltar

GL

Greenland

GP

Guadeloupe (Fr.)

GQ

Equatorial Guinea

GF

Guyana (Fr.)

GM

Gambia

GN

Guinea

GR

Greece

GT

Guatemala

GU

Guam (US)

GW

Guinea Bissau

GY

Guyana

HK

Hong Kong

HM

Heard & McDonald Isl.

HN

Honduras

HR

Croatia

HT

Haiti

HU

Hungary

ID

Indonesia

IE

Ireland

IL

Israel

IN

India

IO

British Indian O. Terr.

IQ

Iraq

IR

Iran

IS

Iceland

IT

Italy

JM

Jamaica

JO

Jordan

JP

Japan

KE

Kenya

KG

Kirgistan

KH

Cambodia

KI

Kiribati

KM

Comoros

KN

St. Kitts Nevis Anguilla

KP

Korea (North)

KR

Korea (South)

KW

Kuwait

KY

Cayman Islands

KZ

Kazachstan

LA

Laos

LB

Lebanon

LC

Saint Lucia

LI

Liechtenstein

LK

Sri Lanka

LR

Liberia

LS

Lesotho

LT

Lithuania

LU

Luxembourg

LV

Latvia

LY

Libya

MA

Morocco

MC

Monaco

MD

Moldavia

MG

Madagascar

MH

Marshall Islands

ML

Mali

MM

Myanmar

MN

Mongolia

MO

Macau

MP

Northern Mariana Isl.

MQ

Martinique (Fr.)

MR

Mauritania

MS

Montserrat

MT

Malta

MU

Mauritius

MV

Maldives

MW

Malawi

MX

Mexico

MY

Malaysia

MZ

Mozambique

NA

Namibia

NC

New Caledonia (Fr.)

NE

Niger

NF

Norfolk Island

NG

Nigeria

NI

Nicaragua

NL

Netherlands

NO

Norway

NP

Nepal

NR

Nauru

NT

Neutral Zone

NU

Niue

NZ

New Zealand

OM

Oman

PA

Panama

PE

Peru

PF

Polynesia (Fr.)

PG

Papua New Guinea

PH

Philippines

PK

Pakistan

PL

Poland

PM

St. Pierre & Miquelon

PN

Pitcairn

PT

Portugal

PR

Puerto Rico (US)

PW

Palau

PY

Paraguay

QA

Qatar

RE

Reunion (Fr.)

RO

Romania

RU

Russian Federation

RW

Rwanda

SA

Saudi Arabia

SB

Solomon Islands

SC

Seychelles

SD

Sudan

SE

Sweden

SG

Singapore

SH

St. Helena

SI

Slovenia

SJ

Svalbard & Jan Mayen Isl.

SK

Slovak Republic

SL

Sierra Leone

SM

San Marino

SN

Senegal

SO

Somalia

SR

Suriname

ST

St. Tome and Principe

SU

Soviet Union

SV

El Salvador

SY

Syria

SZ

Swaziland

TC

Turks & Caicos Islands

TD

Chad

TF

French Southern Terr.

TG

Togo

TH

Thailand

TJ

Tadjikistan

TK

Tokelau

TM

Turkmenistan

TN

Tunisia

TO

Tonga

TP

East Timor

TR

Turkey

TT

Trinidad & Tobago

TV

Tuvalu

TW

Taiwan

TZ

Tanzania

UA

Ukraine

UG

Uganda

UK

United Kingdom

UM

US Minor Outlying Isl.

US

United States

UY

Uruguay

UZ

Uzbekistan

VA

Vatican City State

VC

St.Vincent & Grenadines

VE

Venezuela

VG

Virgin Islands (British)

VI

Virgin Islands (US)

VN

Vietnam

VU

Vanuatu

WF

Wallis & Futuna Islands

WS

Samoa

YE

Yemen

YU

Yugoslavia

ZA

South

ZM

Zambia

ZR

Zaire

ZW

Zimbabwe

Character Sets

Table B-5 lists the character sets that may be used with the Accept-language and Content-language HTTP headers. This list does not describe all of the possible character sets of international languages that can appear in the headers. For a comprehensive list of character sets, their aliases, and pointers to more descriptive documents, refer to RFC 1700.

Table B-5: Character Sets

Character Sets

Language

Source

US-ASCII

American Standard Code for Information Exchange

RFC 1345

ISO-8859-1

Latin Alphabet No. 1

RFC 1345

ISO-8859-2

Latin Alphabet No. 2

RFC 1345

ISO-8859-3

Latin Alphabet No. 3

RFC 1345

ISO-8859-4

Latin Alphabet No. 4

RFC 1345

ISO-8859-5

Latin/Cyrillic Alphabet

RFC 1345

ISO-8859-6

Latin/Arabic Alphabet

RFC 1345

ISO-8859-7

Latin/Greek Alphabet

RFC 1345

ISO-8859-8

Latin/Hebrew Alphabet

RFC 1345

ISO-8859-9

Latin Alphabet No. 5

RFC 1345

ISO-2022-JP

Japanese

RFC 1468

ISO-2022-JP-2

Extension of Japanese in ISO-2022-JP

RFC 1554

ISO-2022-KR

Korean

RFC 1557

UNICODE-1-1

Unicode for MIME

RFC 1641

UNICODE-1-1-UTF-7

7-bit UCS Transformation Format

RFC 1642

UNICODE-1-1-UTF-8

8-bit UCS Transformation Format

N/A

Back to: Chapter Index

Back to: Web Client Programming with Perl


O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies

© 2001, O'Reilly & Associates, Inc.
webmaster@oreilly.com