Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

10 Reasons We Need Java 3.0

by Elliotte Rusty Harold

Over the last few years, refactoring -- the process of gradually improving a code base by renaming methods and classes, extracting common functionality into new methods and classes, and generally cleaning up the mess inherent in most 1.0 systems -- has gained a lot of adherents. Integrated Development Environments (IDEs) like Eclipse and IDEA can now automatically refactor code.

But what if it's not just your code that needs refactoring? What if the language itself has inconsistencies, inefficiencies, and just plain idiocies that need to be corrected? When you get right down to it, the entirety of Java is really just like any other large code base. It has some brilliant parts, some functional parts, and some parts that make just about everyone scratch their heads and ask, "What the hell were they thinking?"

It's now a little more than 11 years after James Gosling began working on OAK, the language that would eventually become Java, and seven years since Sun posted the first public release of Java. The language, class library, and virtual machine collectively known as "Java" are all showing their age. There are many parts of Java that everyone agrees should be fixed but can't be, for reasons of backwards compatibility. Until now, revisions of Java have attempted to maintain "upwards compatibility;" that is, all earlier code should continue to run unchanged in later versions of Java. This has limited the changes that can be made to Java, and prevented Sun from fixing many obvious problems.

This article imagines a "Java 3" that jettisons the baggage of the last decade, and proposes numerous changes to the core language, virtual machine, and class libraries. The focus here is on those changes that many people (including the Java developers at Sun) would really like to make, but can't -- primarily for reasons of backwards compatibility.

I am specifically not focusing on new features that could be added to Java 2 today, useful as they might be. These can be addressed through the Java Community Process. Instead, I want to look at how we could do the same things Java does today, only better. For instance, while I'd love to see a complex number data type as a standard part of the Java language, this could be added to Java 1.5 without breaking existing code. On the other hand, changing the existing char type to use four bytes rather than two would be radically incompatible with most existing code.

Similarly, I am only looking at changes that will leave Java as the same language we know and love today. I want to talk about refactoring the language, not reinventing it. I am not interested in purely syntactic changes, such as eliminating the semicolons at the ends of lines or making indentation significant. These sorts of changes could readily be implemented as byte code compilers for other languages like Python and F. Indeed, such compilers already exist. The changes I want to address are much more fundamental, and often lay across the boundaries between language, library, and virtual machine. With that in mind, let's look at my top 10 list of possible refactorizations for Java 3. (See Gosling's "Design Principles" slide for a justification for simplicity and lack of redundancy.

10. Delete all deprecated methods, fields, classes, and interfaces.

This one's a no-brainer. Java 1.4.0 ships with 22 deprecated classes, 8 deprecated interfaces, 50 deprecated fields, and over 300 deprecated methods and constructors. Some, like List.preferredSize() and Date.parseDate(), are deprecated because there are now equivalent or better methods to do the same thing. Others like Thread.stop() and Thread.resume() are deprecated because they were a bad idea in the first place and could be actively dangerous. Whatever the reason a method has been deprecated, the fact is, we're not supposed to be using it.

Sun's official line is, "It is recommended that programs be modified to eliminate the use of deprecated methods and classes, though there are no current plans to remove such methods and classes entirely from the system." It's time to cut the umbilical cord. Ditch them all now. This can only make Java simpler, cleaner, and safer.

9. Fix incorrect naming conventions.

Related Reading

Java Network Programming
By Elliotte Rusty Harold

One of Java's contributions to code readability has been consistent naming conventions, even though they aren't enforced by the compiler. Class names are nouns that begin with capital letters. Fields, variables, and methods begin with lowercase letters. All use camel case. Named constants are written in all caps with underscores separating the words. I can pick up the code of any experienced Java programmer on the planet and expect that their naming conventions will match mine.

When Java 1.0 was being written, however, not all the programmers had internalized Java's naming conventions yet. There are numerous minor but annoying inconsistencies throughout the API. For instance, the color constants are Color.red, Color.blue, Color.green, etc., instead of Color.RED, Color.BLUE, Color.GREEN, etc. Java 1.4 finally added the capitalized versions, but still retains the incorrect lowercase versions, doubling the number of fields in this class. These inconsistencies should be cataloged and corrected.

Another beneficial coding convention Java thrust upon an occasionally resistant world was using full names with no abbreviations. However, some of the most basic Java methods are abbreviated. Why, for instance, do we type System.gc() instead of System.collectGarbage()? It's not as if this method is called so frequently that the time saved typing twelve fewer letters is important. Similarly the InetAddress class should really be named InternetAddress.

Along the way, let's move JDBC into the javax packages. JDBC is important, but it's hardly a core language feature. The only reason it isn't already in javax is because the javax naming convention for standard extensions hadn't been invented when JDBC was first added to the JDK back in Java 1.1. Programmers working with JDBC can still use it. The rest of us can safely ignore it.

8. Eliminate primitive data types.

This will undoubtedly be my most controversial proposal, but bear with me. I am not talking about removing int, float, double, char, and other types completely. I simply want to make them full objects with classes, methods, inheritance, and so forth. This would make Java's type system much cleaner. We'd no longer need to use type-wrapper classes to add primitives to lists and hash tables. We could write methods that operated on all variables and data. All types would be classes and all classes would be types. Every variable, field, and argument would be an instance of Object. Java would finally become a pure object-oriented language.

The reason Java used primitive data types in the first place was speed. The claim was that pure object-oriented languages like Smalltalk were too slow for production code. But after seven years of Moore's law, computers are a lot faster and have a lot more memory than they used to. Even more importantly, compiler technology has advanced to the point where it's really not so hard to replace object-based source code with primitive-based byte code where appropriate. Modern Eiffel, C#, and Smalltalk compilers already do this. In essence, a good compiler should be able to figure out when to use ints and when to use BigIntegers and transparently swap between the two.

The new byte, int, long, double, float, and char classes would still have the literal forms they have today. Just as the statement String s ="Hello" creates a new String object, so too would int i = 23 create a new int object. Similarly, the compiler would recognize all of the customary operators like +, -, and *, and map them to the appropriate methods in the classes. This is no more complicated than the compiler's native understanding of the plus sign for string concatenation today. Most existing arithmetic code would work exactly as it works today. The int/char/double/float/boolean objects would be immutable, so these objects would be thread-safe and could be interned to save memory. The classes would probably be final for reasons of both safety and performance.

I'd also like to consider whether Java's arithmetic rules are correct. The floating point operations are defined by IEEE 754 and, for compatibility with other languages and hardware, it's important to keep that. The integer types offer real room for improvement, however. It is mathematically incorrect for two billion plus two billion to equal -294,967,296, yet it does in Java today.

There should be at least one integer type that is not bounded in size, and perhaps it should be the default type. If so, it could easily subsume the short, int, and long types. The byte type still seems necessary for I/O, and it could also remain for those rare cases like image filters where bitwise manipulation is really necessary; however, using bitwise operators like << and & on integers confuses implementation with interface and thus violates a fundamental principle of object orientation. The various bitwise constants, such as Font.BOLD and SelectionKey.OP_ACCEPT, used throughout the Java API should be replaced with type-safe enums and/or getter and setter methods.

The basic story would be that integers are for arithmetic and bytes are for memory manipulation. Thus, in reverse, we might choose to ban arithmetic operations like addition and subtraction on bytes. Even today, adding two bytes automatically promotes them to ints because the virtual machine doesn't support these operations on any type narrower than an int.

There's substantial evidence from other pure OO languages that this scheme can be implemented efficiently. Nonetheless, I anticipate resistance to these ideas from the performance-at-any-cost crowd. Naive implementations will require more memory than existing Java code (which is already not particularly stingy with the megabytes). This is likely to be a special problem in J2ME and smaller environments. J2ME might choose to take a different path than J2SE and J2EE.

J2ME can continue development-based Java 2 with its dichotomy between primitive and object types, its 2+2=-1 arithmetic, and all of the problems that entails. In this environment, the benefits of moving may not outweigh the cost. But Java is no longer a language just for cheap set-top boxes (and really it never was). The needs of the desktop and the server are not the same as the needs of the cell phone and the digital watch. Programmers in each environment need a language tailored for them. One size does not fit all.

7. Extend chars to four bytes.

Whether the char type is primitive or an object, the truth is that Unicode is not a two-byte character set. This was perhaps not so important in the last millennium when Unicode characters outside the basic multilingual plane were just a theoretical possibility. As of version 3.2, however, Unicode has about 30,000 more characters than can be squeezed into two bytes. Four-byte characters include many mathematical and most musical symbols. In the future it's also likely to encompass fictional scripts like Tolkien's Tengwar and dead languages like Linear B. Currently, Java tries to work around the problem by using surrogate pairs, but the acrobatics required to properly handle these is truly ugly, and already causing major problems for systems like XML parsers that need to deal with this ugliness.

Whether Java promotes the char type to an object or not, it needs to adopt a model in which characters are a full four bytes. If Java does go to fully object-oriented types, it could still use UTF-16 or UTF-8 internally for chars and strings to save space. Externally, all characters should be created equal. Using one char to represent most characters but two chars to represent some is too confusing. You shouldn't have to be a Unicode expert just to include a little music or math in your strings.

6. Fix threads.

Java was the first major language to integrate multithreading as a fundamental feature rather than a special purpose add-on library. Thus, it's not surprising that its designers made a few mistakes and missteps in this area. All of these need to be fixed:

These changes are going to be tricky, and they're going to require changes at all three levels -- the API, the language specification, and the virtual machine. But they are important, if Java is to remain efficient and reliable on the multiprocessor systems of tomorrow.

5. Convert file formats to XML.

Related Series

XML Basics for Java Developers, Part 5
In this final in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, get an introduction to XSL/XSLT and Web services.

XML Basics for Java Developers, Part 4
In part four in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, learn about validating documents.

XML Basics for Java Developers, Part 3
In part three in this series of book excerpts on XML basics for Java developers from Learning Java, 2nd Edition, learn about the Document Object Model (DOM).

XML Basics for Java Developers, Part 2
In this second part in a several part series on XML for Java developers from Learning Java, 2nd Edition, learn about SAX and the SAX API.

XML Basics for Java Developers, Part 1
This is the first in a series of book excerpts on XML for Java developers from Learning Java, 2nd Edition. This excerpt covers XML fundamentals.

The Java community is already using XML for latter-day file formats like Servlet config files and Ant build files. XML is clean, easy to edit, easy to parse, and easy to debug. It is rightly the first choice of most programmers when designing new file formats. Of course, XML wasn't invented until a couple of years after Java was released. Thus, Java has a number of non-XML file formats that should be ported to XML. Among others, these include JAR manifest files, properties files, and serialized objects. All of these can and should be replaced with well-formed XML.

Serializing objects with XML is perhaps the most surprising suggestion, since serialized objects are binary data and XML is text; however, most data inside objects are just text and numbers at the lowest level; and all of this is well-supported by XML. The limited true binary data inside Java objects can easily be Base-64 encoded. Perhaps most surprisingly, the resulting format should be both smaller and faster than today's binary serialization. Numerous developers have already invented custom XML formats for object serialization, and pretty much all of them have proved more efficient than Java's binary format. The fact is, contrary to popular belief, binary formats are not necessarily smaller or faster than the corresponding text formats, and serialized Java objects are a particularly poorly-optimized binary format. Sun has already implemented an XML-based serialization format for JavaBeans in Java 1.4 in the java.beans.XMLEncoder and java.beans.XMLDecoder classes. Now it just needs to go a step further to cover all serializable objects.

4. Ditch the AWT.

Two GUI APIs is one too many. Most Java developers have chosen to standardize their work on Swing. I agree with them. It's time to merge the Component and JComponent classes, the Frame and JFrame classes, the Menu and JMenu classes, and so forth. In some cases, the classes would come from Swing (JPasswordField, JTable). In others, from the AWT (Font, Color, etc.) Still others (Frame, JFrame) would be merged, typically pulling in most of the code from Swing but retaining the more obvious AWT name. Overall, this would be a huge simplification for GUI development in Java and noticeably cut down on Java's bulk.

As long as we're at it, it's time to get rid of the legacies of the Java 1.0 event model. There's no reason for every component to have a series of confusing handleEvent(), mouseDown(), keyDown(), action(), and similar methods. If they're still being used behind the scenes as part of the infrastructure, at least make them non-public; but I suspect they can be eliminated completely without too much effort.

3. Rationalize the collections.

Java's current collections API is a hodgepodge of different designs implemented at different times. Some classes are thread-safe (Hashtable, Vector). Some aren't (LinkedList, HashMap). Some collections return null when a missing element is requested. Others throw an exception. Let's settle on some standard idioms and metaphors, and design all the classes to fit them, rather the other way around. Probably the easiest way to do this would be to eliminate Vector and Hashtable completely. An ArrayList can do anything a Vector can do and a HashMap can replace a Hashtable.

2. Redesign I/O.

Related Reading

Java I/O
By Elliotte Rusty Harold

The original Java developers were Unix programmers, Windows users, and Mac dilettantes. The I/O APIs they invented were more than a little Unix-centric in both obvious and not-so-obvious ways, and really didn't port very well. For instance, initially they assumed that the file system had a single root. This is true on Unix, but false on Windows and the Mac. Both the new and old I/O APIs still assume that the complete contents of a file can be accessed as a stream (true on Windows and Unix but false on the Mac).

Some of the problems, especially with regard to internationalization, were fixed in Java 1.1, with the introduction of the Reader and Writer classes and their subclasses. Java 1.2 fixed some of the more glaring inadequacies in the file system API. Still more were fixed in Java 1.4 with the new I/O APIs.

The job isn't done yet. For instance, even in Java 1.4 there still isn't a reliable means to copy or move a file -- pretty basic operations, I think you'll agree. To date, attempts to design a better file-system API have foundered on the need to be upwards-compatible with the atrocious Java 1.0 I/O classes. The time has come to reconsider everything in java.io. Some of the more urgently needed changes are:

1. Redesign class loading from scratch, this time with human interface factors in mind.

No single topic is as confusing to new users as the class path. I get almost daily e-mail from novice readers asking me to explain the "Exception in thread 'main' java.lang.NoClassDefFoundError: HelloWorld" error messages they keep seeing. I've been writing Java for seven years and I'm still occasionally baffled by class loader issues. (Pop quiz: When is class A that implements interface B not an instance of interface B? When A and B are loaded by two different class loaders. I lost half a day to that one just last week, and after I mentioned my problem on a mailing list, one talented programmer friend told me he lost two weeks to the exact same bug.)

I'll freely admit that I don't know how the class loader should be fixed. It's clearly one of the trickier areas of Java. I do know that the current system is far too difficult. There has to be a better way.

Summing Up

This top-ten list is just a beginning. There are lots of other areas where Java could be improved, if we allow ourselves to throw off the straitjacket of upwards compatibility: replacing integer constants with type-safe enums, removing confusing and unnecessary classes like StringTokenizer and StreamTokenizer, making Cloneable a true mixin interface or perhaps eliminating it completely, renaming the Math.log() method the Math.ln() method, adding support for true design by contract, eliminating checked exceptions (as Bruce Eckel has advocated), limiting objects to a single thread as in Eiffel, and much more.

We can argue about exactly which changes are necessary, and which ones may cause more harm than good. But one thing's for sure: if Java fails to change, if it refuses to correct its well-known problems, there are other languages waiting in the wings written by some very sharp programmers who have learned from Java's mistakes and are eager for the opportunity to replace Java in the same way Java replaced earlier flawed languages. Java must not be forever handicapped by mistakes made seven years ago in Java 1.0. There comes a point where we need to throw off the chains of backwards compatibility and move boldly into the future.

Elliotte Rusty Harold is a noted writer and programmer, both on and off the Internet. His previous books include "Java Network Programming", Third Edition, "XML in a Nutshell", Third Edition, and "Java I/O", all from O'Reilly.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.