ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


O'Reilly Book Excerpts: Java Examples in a Nutshell, 3rd Edition

Internationalization, Part 1

by David Flanagan

Editor's note: Writing software that is truly multilingual is not an easy task. In this excerpt from Chapter 8 of Java Examples in a Nutshell, 3rd Edition, author David Flanagan offers real-world programming examples covering the three steps to internationalization in Java. This week, he covers how to use Unicode character encoding and how to handle local customs. Next week's excerpt will cover the third step: localizing user-visible messages.

Related Reading

Java Examples in a Nutshell
By David Flanagan

Internationalization is the process of making a program flexible enough to run correctly in any locale. The required corollary to internationalization is localization—the process of arranging for a program to run in a specific locale.

There are several distinct steps to the task of internationalization. Java (1.1 and later) addresses these steps with several different mechanisms:

This chapter discusses all three aspects of internationalization.

A Word About Locales

A locale represents a geographic, political, or cultural region. In Java, locales are represented by the java.util.Locale class. A locale is frequently defined by a language, which is represented by its standard lowercase two-letter code, such as en (English) or fr (French). Sometimes, however, language alone is not sufficient to uniquely specify a locale, and a country is added to the specification. A country is represented by an uppercase two-letter code. For example, the United States English locale (en_US) is distinct from the British English locale (en_GB), and the French spoken in Canada (fr_CA) is different from the French spoken in France (fr_FR). Occasionally, the scope of a locale is further narrowed with the addition of a system-dependent variant string.

The Locale class maintains a static default locale, which can be set and queried with Locale.setDefault( ) and Locale.getDefault( ). Locale-sensitive methods in Java typically come in two forms. One uses the default locale, and the other uses a Locale object that is explicitly specified as an argument. A program can create and use any number of nondefault Locale objects, although it is more common simply to rely on the default locale, which is inherited from the underlying default locale on the native platform. Locale-sensitive classes in Java often provide a method to query the list of locales that they support.

Finally, note that AWT and Swing GUI components (see Chapter 11) have a locale property, so it is possible for different components to use different locales. (Most components, however, are not locale-sensitive; they behave the same in any locale.)

Unicode

Java uses the Unicode character encoding. (Java 1.3 uses Unicode Version 2.1. Support for Unicode 3.0 will be included in Java 1.4 or another future release.) Unicode is a 16-bit character encoding established by the Unicode Consortium, which describes the standard as follows (see http://unicode.org ):

The Unicode Standard defines codes for characters used in the major languages written today. Scripts include the European alphabetic scripts, Middle Eastern right-to-left scripts, and scripts of Asia. The Unicode Standard also includes punctuation marks, diacritics, mathematical symbols, technical symbols, arrows, dingbats, etc. ... In all, the Unicode Standard provides codes for 49,194 characters from the world's alphabets, ideograph sets, and symbol collections.

In the canonical form of Unicode encoding, which is what Java char and String types use, every character occupies two bytes. The Unicode characters \u0020 to \u007E are equivalent to the ASCII and ISO8859-1 (Latin-1) characters 0x20 through 0x7E. The Unicode characters \u00A0 to \u00FF are identical to the ISO8859-1 characters 0xA0 to 0xFF. Thus, there is a trivial mapping between Latin-1 and Unicode characters. A number of other portions of the Unicode encoding are based on preexisting standards, such as ISO8859-5 (Cyrillic) and ISO8859-8 (Hebrew), though the mappings between these standards and Unicode may not be as trivial as the Latin-1 mapping.

Note that Unicode support may be limited on many platforms. One of the difficulties with the use of Unicode is the poor availability of fonts to display all the Unicode characters. Figure 8-1 shows some of the characters that are available in the standard fonts that ship with Sun's Java 1.3 SDK for Linux. (Note that these fonts do not ship with the Java JRE, so even if they are available on your development platform, they may not be available on your target platform.) Note the special box glyph that indicates undefined characters.

Figure 8-1. Some Unicode characters and their encodings
Figure 8-1. Some Unicode characters and their encodings

Example 8-1 lists code used to create the displays of Figure 8-1. Because Unicode characters are integrated so fundamentally into the Java language, this UnicodeDisplay program does not perform any sophisticated internationalization techniques to display Unicode glyphs. Thus, you'll find that Example 8-1 is more of a Swing GUI example rather than an internationalization example. If you haven't read Chapter 11 yet, you may not understand all the code in this example.

Example 8-1. UnicodeDisplay.java

package je3.i18n;
import javax.swing.*;
import java.awt.*;
import java.awt.event.*;

/**
 * This program displays Unicode glyphs using user-specified fonts
 * and font styles.
 **/
public class UnicodeDisplay extends JFrame implements ActionListener {
    int page = 0; 
    UnicodePanel p;
    JScrollBar b;
    String fontfamily = "Serif";
    int fontstyle = Font.PLAIN;

    /** 
     * This constructor creates the frame, menubar, and scrollbar
     * that work along with the UnicodePanel class, defined below
     **/
    public UnicodeDisplay(String name) {
        super(name);
        p = new UnicodePanel( );                // Create the panel
        p.setBase((char)(page * 0x100));       // Initialize it
        getContentPane( ).add(p, "Center");     // Center it
        
        // Create and set up a scrollbar, and put it on the right
        b = new JScrollBar(Scrollbar.VERTICAL, 0, 1, 0, 0xFF);
        b.setUnitIncrement(1);
        b.setBlockIncrement(0x10);
        b.addAdjustmentListener(new AdjustmentListener( ) {
                public void adjustmentValueChanged(AdjustmentEvent e) {
                    page = e.getValue( );
                    p.setBase((char)(page * 0x100));
                }
            });
        getContentPane( ).add(b, "East");
        
        // Set things up so we respond to window close requests
        this.addWindowListener(new WindowAdapter( ) {
                public void windowClosing(WindowEvent e) { System.exit(0); }
            });

        // Handle Page Up and Page Down and the up and down arrow keys
        this.addKeyListener(new KeyAdapter( ) {
                public void keyPressed(KeyEvent e) {
                    int code = e.getKeyCode( );
                    int oldpage = page;
                    if ((code == KeyEvent.VK_PAGE_UP) ||
                        (code == KeyEvent.VK_UP)) {
                        if (e.isShiftDown( )) page -= 0x10;
                        else page -= 1;
                        if (page < 0) page = 0;
                    }
                    else if ((code == KeyEvent.VK_PAGE_DOWN) ||
                             (code == KeyEvent.VK_DOWN)) {
                        if (e.isShiftDown( )) page += 0x10;
                        else page += 1;
                        if (page > 0xff) page = 0xff;
                    }
                    if (page != oldpage) {     // if anything has changed...
                        p.setBase((char) (page * 0x100)); // update the display
                        b.setValue(page);     // and update scrollbar to match
                    }
                }
            });

        // Set up a menu system to change fonts.  Use a convenience method.
        JMenuBar menubar = new JMenuBar( );
        this.setJMenuBar(menubar);
        menubar.add(makemenu("Font Family", 
                             new String[  ] {"Serif", "SansSerif", "Monospaced"},
                             this));
        menubar.add(makemenu("Font Style", 
                             new String[  ]{
                                 "Plain","Italic","Bold","BoldItalic"
                             }, this));
    }
    
    /** This method handles the items in the menubars */
    public void actionPerformed(ActionEvent e) {
        String cmd = e.getActionCommand( );
        if (cmd.equals("Serif")) fontfamily = "Serif";
        else if (cmd.equals("SansSerif")) fontfamily = "SansSerif";
        else if (cmd.equals("Monospaced")) fontfamily = "Monospaced";
        else if (cmd.equals("Plain")) fontstyle = Font.PLAIN;
        else if (cmd.equals("Italic")) fontstyle = Font.ITALIC;
        else if (cmd.equals("Bold")) fontstyle = Font.BOLD;
        else if (cmd.equals("BoldItalic")) fontstyle = Font.BOLD + Font.ITALIC;
        p.setFont(fontfamily, fontstyle);
    }
    
    /** A convenience method to create a Menu from an array of items */
    private JMenu makemenu(String name, String[  ] itemnames,
                           ActionListener listener)
    {
        JMenu m = new JMenu(name);
        for(int i = 0; i < itemnames.length; i++) {
            JMenuItem item = new JMenuItem(itemnames[i]);
            item.addActionListener(listener);
            item.setActionCommand(itemnames[i]);  // okay here, though
            m.add(item);
        }
        return m;
    }
    
    /** The main( ) program just creates a window, packs it, and shows it */
    public static void main(String[  ] args) {
        UnicodeDisplay f = new UnicodeDisplay("Unicode Displayer");
        f.pack( );
        f.show( );
    }
    
    /** 
     * This nested class is the one that displays one "page" of Unicode
     * glyphs at a time.  Each "page" is 256 characters, arranged into 16
     * rows of 16 columns each.
     **/
    public static class UnicodePanel extends JComponent {
        protected char base;  // What character we start the display at
        protected Font font = new Font("serif", Font.PLAIN, 18);
        protected Font headingfont = new Font("monospaced", Font.BOLD, 18);
        static final int lineheight = 25;
        static final int charspacing = 20;
        static final int x0 = 65;
        static final int y0 = 40;
        
        /** Specify where to begin displaying, and redisplay */
        public void setBase(char base) { this.base = base; repaint( ); }
        
        /** Set a new font name or style, and redisplay */
        public void setFont(String family, int style) { 
            this.font = new Font(family, style, 18); 
            repaint( ); 
        }
        
        /**
         * The paintComponent( ) method actually draws the page of glyphs 
         **/
        public void paintComponent(Graphics g) {
            int start = (int)base & 0xFFF0; // Start on a 16-character boundary
            
            // Draw the headings in a special font
            g.setFont(headingfont);
            
            // Draw 0..F on top
            for(int i=0; i < 16; i++) {
                String s = Integer.toString(i, 16);
                g.drawString(s, x0 + i*charspacing, y0-20);
            }
            
            // Draw column down left.
            for(int i = 0; i < 16; i++) {
                int j = start + i*16;
                String s = Integer.toString(j, 16);
                g.drawString(s, 10, y0+i*lineheight);
            }
            
            // Now draw the characters
            g.setFont(font);
            char[  ] c = new char[1];
            for(int i = 0; i < 16; i++) {
                for(int j = 0; j < 16; j++) {
                    c[0] = (char)(start + j*16 + i);
                    g.drawChars(c, 0, 1, x0 + i*charspacing, y0+j*lineheight);
                }
            }
        }
        
        /** Custom components like this one should always have this method */
        public Dimension getPreferredSize( ) {
            return new Dimension(x0 + 16*charspacing, 
                                 y0 + 16*lineheight);
        }
    }
}

Character Encodings

Text representation has traditionally been one of the most difficult problems of internationalization. Java, however, solves this problem quite elegantly and hides the difficult issues. Java uses Unicode internally, so it can represent essentially any character in any commonly used written language. As I noted earlier, the remaining task is to convert Unicode to and from locale-specific encodings. Java includes quite a few internal byte-to-char and char-to-byte converters that handle converting locale-specific character encodings to Unicode and vice versa. Although the converters themselves are not public, they are accessible through the InputStreamReader and OutputStreamWriter classes, which are character streams included in the java.io package.

Any program can automatically handle locale-specific encodings simply by using these character stream classes to do their textual input and output. Note that the FileReader and FileWriter classes use these streams to automatically read and write text files that use the platform's default encoding.

Example 8-2 shows a simple program that works with character encodings. It converts a file from one specified encoding to another by converting from the first encoding to Unicode and then from Unicode to the second encoding. Note that most of the program is taken up with the mechanics of parsing argument lists, handling exceptions, and so on. Only a few lines are required to create the InputStreamReader and OutputStreamWriter classes that perform the two halves of the conversion. Also note that exceptions are handled by calling LocalizedError.display( ). This method is not part of the Java API; it is a custom method shown in Example 8-5 at the end of this chapter.

Example 8-2. ConvertEncoding.java

package je3.i18n;
import java.io.*;

/** A program to convert from one character encoding to another */
public class ConvertEncoding {
    public static void main(String[  ] args) {
        String from = null, to = null;
        String infile = null, outfile = null;
        for(int i = 0; i < args.length; i++) { // Parse command-line arguments.
            if (i == args.length-1) usage( );   // All args require another.
            if (args[i].equals("-from")) from = args[++i];
            else if (args[i].equals("-to")) to = args[++i];
            else if (args[i].equals("-in")) infile = args[++i];
            else if (args[i].equals("-out")) outfile = args[++i];
            else usage( );
        }

        try { convert(infile, outfile, from, to); }  // Attempt conversion.
        catch (Exception e) {                        // Handle exceptions.
            LocalizedError.display(e);  // Defined at the end of this chapter.
            System.exit(1);
        }
    }

    public static void usage( ) {
        System.err.println("Usage: java ConvertEncoding <options>\n" +
                           "Options:\n\t-from <encoding>\n\t" + 
                           "-to <encoding>\n\t" +
                           "-in <file>\n\t-out <file>");
        System.exit(1);
    }

    public static void convert(String infile, String outfile,
                               String from, String to)
              throws IOException, UnsupportedEncodingException
    {
        // Set up byte streams.
        InputStream in;
        if (infile != null) in = new FileInputStream(infile);
        else in = System.in;
        OutputStream out;
        if (outfile != null) out = new FileOutputStream(outfile);
        else out = System.out;
        
        // Use default encoding if no encoding is specified.
        if (from == null) from = System.getProperty("file.encoding");
        if (to == null) to = System.getProperty("file.encoding");
        
        // Set up character streams.
        Reader r = new BufferedReader(new InputStreamReader(in, from));
        Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
        
        // Copy characters from input to output.  The InputStreamReader
        // converts from the input encoding to Unicode, and the
        // OutputStreamWriter converts from Unicode to the output encoding.
        // Characters that cannot be represented in the output encoding are
        // output as '?'
        char[  ] buffer = new char[4096];
        int len;
        while((len = r.read(buffer)) != -1)  // Read a block of input.
            w.write(buffer, 0, len);         // And write it out.
        r.close( );                           // Close the input.
        w.close( );                           // Flush and close output.
    }
}

Handling Local Customs

The second problem of internationalization is the task of following local customs and conventions in areas such as date and time formatting. The java.text package defines classes to help with this duty.

The NumberFormat class formats numbers, monetary amounts, and percentages in a locale-dependent way for display to the user. This is necessary because different locales have different conventions for number formatting. For example, in France, a comma is used as a decimal separator instead of a period, as in many English-speaking countries. A NumberFormat object can use the default locale or any locale you specify. NumberFormat has factory methods for obtaining instances that are suitable for different purposes, such as displaying monetary quantities or percentages. In Java 1.4 and later, the java.util.Currency class can be used with NumberFormat object so that it can correctly print an appropriate currency symbol.

The DateFormat class formats dates and times in a locale-dependent way for display to the user. Different countries have different conventions. Should the month or day be displayed first? Should periods or colons separate fields of the time? What are the names of the months in the language of the locale? A DateFormat object can simply use the default locale, or it can use any locale you specify. The DateFormat class is used in conjunction with the TimeZone and Calendar classes of java.util. The TimeZone object tells the DateFormat what time zone the date should be interpreted in, while the Calendar object specifies how the date itself should be broken down into days, weeks, months, and years. Almost all locales use the standard GregorianCalendar. SimpleDateFormat is a useful subclass of DateFormat: it allows dates to be formatted to or parsed from a date format specified with a simple template string.

The Collator class compares strings in a locale-dependent way. This is necessary because different languages alphabetize strings in different ways (and some languages don't even use alphabets). In traditional Spanish, for example, the letters "ch" are treated as a single character that comes between "c" and "d" for the purposes of sorting. When you need to sort strings or search for a string within Unicode text, you should use a Collator object, either one created to work with the default locale or one created for a specified locale.

The BreakIterator class allows you to locate character, word, line, and sentence boundaries in a locale-dependent way. This is useful when you need to recognize such boundaries in Unicode text, such as when you are implementing a word-wrapping algorithm.

Example 8-3 shows a class that uses the NumberFormat and DateFormat classes to display a hypothetical stock portfolio to the user following local conventions. The program uses various NumberFormat and DateFormat objects to format (using the format( ) method) different types of numbers and dates. These Format objects all operate using the default locale but could have been created with an explicitly specified locale. The program displays information about a hypothetical stock portfolio, formatting dates and numbers and monetary values according to the current or the specified locale. Figure 8-2 shows example output in different locales. The output was produced by running the program in the default locale, with the arguments "en GB" and "ja JP".

Figure 8-2. Stock portfolios formatted for U.S., British, and French locales
Figure 8-2. Stock portfolios formatted for U.S., British, and French locales

Example 8-3. Portfolio.java

package je3.i18n;
import java.text.*;
import java.util.*;
import java.io.*;

/**
 * A partial implementation of a hypothetical stock portfolio class.
 * We use it only to demonstrate number and date internationalization.
 **/
public class Portfolio {
    EquityPosition[  ] positions;        // The positions in the portfolio
    Date lastQuoteTime = new Date( );   // Time for current quotes

    // Create a Portfolio
    public Portfolio(EquityPosition[  ] positions, Date lastQuoteTime) {
        this.positions = positions;
        this.lastQuoteTime = lastQuoteTime;
    }
    
    // A helper class: represents a single stock purchase
    static class EquityPosition {
        String name;             // Name of the stock.
        int shares;              // Number of shares held.
        Date purchased;          // When purchased.
        Currency currency;       // What currency are the prices expressed in?
        double bought;           // Purchase price per share
        double current;          // Current price per share

        // Format objects like this one are useful for parsing strings as well
        // as formatting them.  This is for converting date strings to Dates.
        static DateFormat dateParser = new SimpleDateFormat("yyyy-MM-dd");

        EquityPosition(String n, int s, String date, Currency c,
                       double then, double now) throws ParseException
        {
            // Convert the purchased date string to a Date object.
            // The string must be in the format yyyy-mm-dd
            purchased = dateParser.parse(date);
            // And store the rest of the fields, too.
            name = n; shares = s; currency = c;
            bought = then; current = now;
        }
    }

    // Return a localized HTML-formatted string describing the portfolio
    public String toString( ) {
        StringBuffer b = new StringBuffer( );

        // Obtain NumberFormat and DateFormat objects to format our data.
        NumberFormat number = NumberFormat.getInstance( );
        NumberFormat price = NumberFormat.getCurrencyInstance( );
        NumberFormat percent = NumberFormat.getPercentInstance( );
        DateFormat shortdate = DateFormat.getDateInstance(DateFormat.MEDIUM);
        DateFormat fulldate = DateFormat.getDateTimeInstance(DateFormat.LONG,
                                                             DateFormat.LONG);


        // Print some introductory data.
        b.append("<html><body>");
        b.append("<i>Portfolio value at ").
            append(fulldate.format(lastQuoteTime)).append("</i>");
        b.append("<table border=1>");
        b.append("<tr><th>Symbol<th>Shares<th>Purchased<th>At<th>" +
                 "Quote<th>Change</tr>");
        
        // Display the table using the format( ) methods of the Format objects.
        for(int i = 0; i < positions.length; i++) {
            b.append("<tr><td>");
            b.append(positions[i].name).append("<td>");
            b.append(number.format(positions[i].shares)).append("<td>");
            b.append(shortdate.format(positions[i].purchased)).append("<td>");
            // Set the currency to use when printing the following prices
            price.setCurrency(positions[i].currency);
            b.append(price.format(positions[i].bought)).append("<td>");
            b.append(price.format(positions[i].current)).append("<td>");
            double change =
                (positions[i].current-positions[i].bought)/positions[i].bought;
            b.append(percent.format(change)).append("</tr>");
        }
        b.append("</table></body></html>");
        return b.toString( );
    }
    
    /**
     * This is a test program that demonstrates the class
     **/
    public static void main(String[  ] args) throws ParseException {
        Currency dollars = Currency.getInstance("USD");
        Currency pounds = Currency.getInstance("GBP");
        Currency euros = Currency.getInstance("EUR");
        Currency yen = Currency.getInstance("JPY");

        // This is the portfolio to display.
        EquityPosition[  ] positions = new EquityPosition[  ] {
            new EquityPosition("WWW", 400, "2003-01-03", dollars, 11.90,13.00),
            new EquityPosition("XXX", 1100, "2003-02-02", pounds, 71.09,27.25),
            new EquityPosition("YYY", 6000, "2003-04-17", euros, 23.37,89.12),
            new EquityPosition("ZZZ", 100, "2003-8-10", yen, 100000,121345)
        };

        // Create the portfolio from these positions
        Portfolio portfolio = new Portfolio(positions, new Date( ));

        // Set the default locale using the language code and country code
        // specified on the command line.
        if (args.length == 2) Locale.setDefault(new Locale(args[0], args[1]));

        // Now display the portfolio.
        // We use a Swing dialog box to display it because the console may
        // not be able to display non-ASCII characters like currency symbols
        // for Pounds, Euros, and Yen.
        javax.swing.JOptionPane.showMessageDialog(null, portfolio,
                               Locale.getDefault( ).getDisplayName( ),
                               javax.swing.JOptionPane.INFORMATION_MESSAGE);

        // The modal dialog starts another thread running, so we have to exit
        // explictly when the user dismisses it.
        System.exit(0);
    }
}

Setting the Locale

Example 8-3 contains code that explicitly sets the locale using the language code and the country code specified on the command line. If these arguments are not specified, it uses the default locale for your system. When experimenting with internationalization, you may want to change the default locale for the entire platform so you can see what happens. How you do this is platform-dependent. On Unix platforms, you typically set the locale by setting the LANG environment variable. For example, to set the locale for Canadian French, using a Unix csh-style shell, use this command:

% setenv LANG fr_CA

Or, to set the locale to English as spoken in Great Britain when using a Unix sh-style shell, use this command:

$ export LANG=en_GB

To set the locale in Windows, use the Regional Settings control on the Windows Control Panel.

David Flanagan is the author of a number of O'Reilly books, including Java in a Nutshell, Java Examples in a Nutshell, Java Foundation Classes in a Nutshell, JavaScript: The Definitive Guide, and JavaScript Pocket Reference.


View catalog information for Java Examples in a Nutshell, 3rd Edition

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.