Appendix F. Charsets

The following table lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet’s PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType( ) method before the servlet retrieves its PrintWriter, for example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset
PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

The charset can also be set implicitly using the setLocale( ) method, for example:

res.setContentType("text/html");
res.setLocale(new Locale("ja", ""));  // Sets charset to Shift_JIS
PrintWriter out = res.getWriter();    // Writes Shift_JIS Japanese

The setLocale( ) method assigns a charset to the response according to the table listed here. Where multiple charsets are possible, the first listed charset is chosen.

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Further note that the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Language	Language Code	Suggested Charsets
Albanian	sq	ISO-8859-2
Arabic	ar	ISO-8859-6
Bulgarian	bg	ISO-8859-5
Byelorussian	be	ISO-8859-5 ...

Get Java Servlet Programming, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Servlet Programming, 2nd Edition by Jason Hunter, William Crawford

Appendix F. Charsets

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly