Why MySQL grew so fast (news from the 2004 MySQL Users Conference)

Email.Email weblog link
Blog this.Blog this

Andy Oram
Apr. 19, 2004 11:07 AM

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

URL: http://www.mysql.com/news-and-events/users-conference/...

If you attend enough computer conferences, you run into every occupation on Earth. At the MySQL Users Conference last week, I sat next to a person at lunch who announced proudly that his job was to destroy data. He works for a firm that specializes in data services for law cases, and at the end of many cases the judge orders the total destruction of data related to the case.

On the other side of me, at that lunch, sat a database administrator whose facility is planning a migration from Oracle to MySQL. A few years ago, people might assume a site would start with MySQL and move up to Oracle as its needs grew. Now there's a quiet trend in the other direction. (I should mention here, though, that MySQL managers downplay the obviously competitive situation and like to say that the different products are for different markets.)

My lunch partner said his firm would save an enormous amount of money on both licenses and support. I was left with the impression that Oracle took a big risk by moving from a perpetual license to a four-year one: they set the timetable for this company's move.

In this article I'll cover:

The business model: why did MySQL grow so fast?

MySQL represents the most impressive market success, exceeded only perhaps by Apache, in free and open source software. In terms of installed base, MySQL has left the technically impressive rival PostgreSQL in the dust. It has marginalized mSQL, SQLite, and SAP DB (the last of which I'll return to later). It has started to challenge the proprietary database companies on their own turf, as already mentioned. Nobody can say why licensing costs for proprietary databases have plummeted in recent years, but one suspects that it's due to MySQL's competition, as are the large discounts Microsoft has offered certain customers.

Not convinced yet?

  • MySQL AB claims an installed base of five million systems, the largest of any database engine.

  • The mysql.com domain sees almost as much traffic as ibm.com.

  • Six hundred attendees flocked to the recent conference.

  • MySQL AB has recently started, and has been heavily marketing, its own publishing outlet, MySQL Press.

  • MySQL has made the use of a database so commonplace that industry observer Clay Shirky, in his recent article Situated Software, writes:

    You can of course build these kind of features [rapidly developed applications for small, localized groups of users] in other ways, but MySQL makes the job much easier, so much easier in fact that after MySQL, it becomes a different kind of job. There are complicated technical arguments for and against using MySQL vs. other databases, but none of those arguments matter anymore. For whatever reason, MySQL seems to be a core tool for this particular crop of new applications.

So how did MySQL achieve this charmed status?

A textbook case of a disruptive technology

MySQL, first of all, illustrates in almost pure form the sequence of events Clayton M. Christensen documented as a "disruptive technology" in his ground-breaking book The Innovator's Dilemma. Early versions of MySQL lacked the basic features, such as ACID transactions and referential integrity, that experienced users expected from a relational database. In a pattern familiar to anyone who has read Christensen's book, knowledgeable observers dismissed MySQL as a toy.

But MySQL's very simplicity made it so small and fast that it quickly won over small users who wouldn't even understand what they were missing and how to use the fancy features offered by "real" database engines. In particular, MySQL proved ideal for the exploding area of dynamic Web content.

Most indicative of its mantle as a true disruptive technology, MySQL proved that many of the missing high-end features weren't as indispensable as people used to claim. For instance, referential integrity (jeez, who could be opposed to integrity?) wasn't required in a database when it could be achieved in the application code, often more reliably. You could also achieve efficient locking without row-level locks; in fact, supporting row-level locks took so much overhead that the application was almost better without them.

Having rewritten the rules for what constituted a useful relational database engine, MySQL AB proceeded to invest resources to implement the very features which they were originally sneered at for lacking. Bit by bit they have added check-off items to their T-shirts. And what's most interesting is how they found the resources to pull off this kind of upgrade cycle.

The importance of dual-licensing

Of course, any agreement under which you release free software (other than the public domain) is a license, but "licensing" usually refers to selling licenses. And MySQL AB has become one of most successful companies with a completely complementary dual-licensing model: they offer everything under an open license for certain users, but charge money for everything under other circumstances. (These circumstances will be discussed further down under "The licensing of free software.") As we'll see, the parallel existence of GPL licensing and commercial licensing leaves a mark on every aspect of the company.

The CEO of MySQL AB, Marten Mickos, said that more than half of their money comes from license fees. This contrasts with an impression of open source software left by Novell vice president Chris Stone in his keynote (described later). Stone, claiming that Novell had already settled on a maintenance model for revenue, suggested that, because of this, the move to open source will not be as hard for Novell as for other traditional computer companies. The remarks implied that an open source business model has to be a support model, but MySQL AB staff pointed out that support contracts have been shown to be insufficient to fund software development. It may be enough in the future, but it's not yet.

The other side of dual licensing is equally important. In terms of adoption, open licenses do more for a software project than twenty thousand billboards and glossy ads. The GPL allowed MySQL to penetrate millions of sites that would never have otherwise known about it.

But the GPL also created a hotbed of user participation that can be witnessed to this day, as MySQL AB employees repeatedly ask their users for feedback. MySQL AB also benefits directly from contributions; for instance, its most feature-rich storage engine, InnoDB, started as an outside project.

But MySQL would have remained a stepping stone to other databases for many people, were it not for its continual growth and improvement. This rate of improvement is not exceedingly fast (managers stress that they always check for stability, correctness, and performance before releasing enhancements) but it's fast enough to give customers the impression that features are worth waiting for--that what they want will in due time be added to the product.

And there's a symbiosis between technical development and payments for licenses. Each requires the other. If a substantial body of enhancements to MySQL grew up outside the company--even if they were put under the GPL and MySQL AB could incorporate them into its version of MySQL--they would not be part of the value MySQL AB could offer paying customers. There would thus be few paying customers, and MySQL AB could not afford to hire people to keep up development. In order to keep up with customer needs, MySQL AB has managed one of the coolest tricks in open source development: keeping most development in-house. And making users happy about it!

Founder David Axmark told me there's tremendous power in having a product unambiguously associated with a single company. Whereas Linux and Apache belong to everybody and nobody, MySQL is taken seriously by large companies with money to spend because there's a company that owns a trademark on it and markets it like a proprietary product.

So MySQL succeeds at maintaining two faces. To paying customers, it's a traditional, responsible vendor. To programmers and database administrators, it's a flexible, responsive network of independently-minded developers in free-software style.

SAP adds its muscle

Nobody would be sorry to have the backing that comes from such a large and well-established corporation as SAP. But in addition to SAP's prestige and endorsement of MySQL, what is the main contribution of the partnership?

Not MaxDB. This is the new name for SAP DB, and was honored with several sessions at the conference, all poorly attended.

And probably not the money SAP invested in MySQL AB as part of the partnership they announced in May 2003. Certainly this helped to spur the enormous hiring campaign MySQL has been on during the past year. (They announced that they doubled the size of their company to 134 staff.)

The impression given by Kaj Arnö, in his presentation on the SAP partnership, was that the best part lies in the expertise SAP brings to areas where MySQL needs to upgrade. SAP DB contains a number of features that MySQL AB would like to implement, and through the partnership they can do so much more quickly. In particular, MySQL 5.1 is supposed to contain server-side cursors, views, standard error handling, standard security handling, schemas, and constraints.

There are three reasons for incorporating SAP DB features into MySQL:

  1. They are genuinely useful.

  2. They are needed to run SAP.

  3. They are ANSI-compliant.

We have to start with the understanding that complete compliance with the ANSI SQL standard (one always has to ask, "Which standard?") is pretty much impossible. See, for instance, the negative assessments by SQL standards experts Michael M. Gorman and Peter Gulutzan (the latter now a MySQL AB employee). But MySQL AB would like to approach compliance with the core SQL standards. They don't plan complete compliance even with this limited part, because it would require them to sacrifice other crucial selling points: speed, and ease of use and management.

Meanwhile, Arnö laid out a roadmap for merging MySQL with MaxDB, beginning with a proxy that translates the protocol used by a MySQL client (and eventually, the particular syntax of MySQL commands where they differ from MaxDB) into a format a MaxDB server can recognize.

The licensing of free software

As I said earlier, dual-licensing is central to MySQL's business model. So under what circumstances must you license MySQL? There's a "nice guy" answer that's fairly clear, and a formal legal answer that's considerably murkier.

The nice guy answer is (I believe I am quoting Monty Widenius directly here): "If you distribute MySQL for free, you get it for free, but if you charge money for it you give us money." I believe this covers most cases neatly. For instance, I think everybody agrees that a store can run its inventory application on MySQL, or an airline book tickets through its Web page backed up by a MySQL database, without paying for it. These businesses are not making money by distributing MySQL; they're just users. And I'm pretty sure the GPL covers them.

Most situations requiring payment are also clear. If you enhance the MySQL source code in some way and sell it without distributing the source code, you have to pay a license.

But what about the case of the application service provider? This is a common problem in GPL-land that I don't believe has even been resolved. At a Birds of a Feather session at the MySQL conference one evening--a session well attended by about 25 very interested people--one programmer for a game company laid out a situation where they run their multi-user game on a server backed up by MySQL, and distribute only a client. Do they have to pay a license? After all, they're not distributing MySQL itself.

Zak Greant, a long-time MySQL public figure (listed in the conference brochure as their "community advocate") said the game company should pay. The game could not run without MySQL, and the client was the means of access by paying customers.

Several attendees then tried to extend Greant's reasoning. Why, then, shouldn't users of Web browsers pay license fees for accessing Web pages backed by MySQL? Well, besides the absurdity of trying to enforce such a payment regime, the Web server does not use a proprietary, specialized protocol as the game does.

I found Greant's argument strained, but I appreciate the need for MySQL AB to share in the profits from services that depend on MySQL.

(UPDATE, April 22: Zak wrote to explain the thinking behind asking a game company to pay a license on a system where the client and server use the MySQL protocol. It makes a lot more sense now. If the client and server communicate using the MySQL protocol, the client is no doubt written with the MySQL library that implements the protocol. (Who would reinvent the wheel just to save a few bucks?) Under the GPL--although perhaps not the LGPL--the game client is an extension of MySQL and qualifies for the commercial license.)

The length and heat of this late-night argument shows that open source licensing still has to shake out. But let's remember that proprietary licensing is an even deeper pit.

There are clauses in most software licenses (such as prohibitions of reverse engineering) that are flat-out illegal. Many more clauses are so ambiguous that any guess about their interpretation by the courts would be as good as a coin flip. Many organizations probably pay a lot more in license fees than they'd have to pay if they took the time to examine the licenses with a fine-toothed comb and showed a willingness to go to court. And of course, we're still arguing over what's covered by fair use, what constitutes a trade secret, and whether the DMCA outlaws Web links to illegal code.

So let's see if we can pull ahead of the pack in free software. Let's see whether the field can establish a system that's readily understandable, fair, and conducive to growth. I'll return to this question when covering Brian Behlendorf's keynote.

Cluster around and take a close look

My own close look at the new MySQL Cluster product leaves me puzzled, and several other people I talked to at the conference had the same feeling.

At recent LinuxWorld conferences, I've noticed several companies marketing cluster solutions that support MySQL. MySQL AB has apparently decided these companies had a good idea. At this conference, they announced their own clustering solution and offered several sessions on it.

MySQL Cluster is a separate network of nodes that replicate data through striping. The key for each table row (which is added behind the scenes if the programmer does not specify it) is hashed to determine which nodes store the row.

At the MySQL server, the clusters are supported by a new storage engine (a.k.a. table type) that has many of the features of InnoDB, but apparently not all. Other than specifying the new storage engine, programmers don't need to make any changes to their code, although some types of optimization are different when working with clusters. Developer Mikael Ronström--who has been working on this technology for over 15 years and did an implementation for phone company Ericsson before coming to MySQL AB--claimed that MySQL AB offers five to six nines of availability.

Now for the catch. All databases handled by the cluster have to be stored in primary memory. One can spread the data across several nodes, but their combined memory is a limit on the size of databases.

In discussions, it seemed to several of us that any company willing to devote 6, 8, or 12 systems to their database will have more data than fits in a few system's memory. MySQL Cluster will add disk storage eventually, but it will take some time to come, and when it does it will probably erode some of the vaunted speed advantages of MySQL Cluster. For instance:

  • Updates will no longer be so fast (nearly as fast as reads, currently).

  • Restarting nodes will take longer.

  • Restarting as the main way of recovering from inconsistencies may become less appealing.

Emic Networks gave me a data sheet that compared their product, Emic Application Cluster, to MySQL Cluster. Everybody is very polite about these matters, of course, and says that different products are appropriate for different markets. Essentially, MySQL Cluster offers speed--particularly for updates--whereas Emic offers larger data sizes. Emic is also more robust at handling soft failures, such as a node overwhelmed by a high volume of queries. The key market for MySQL Cluster seems to be telecom (where the technology emerged), whereas Emic has customers in more traditional business areas.

So, what is Novell's Linux client environment?

Not desktop! No--a Linux client. That's the word I heard from keynote speaker Chris Stone, the vice president on whose advice Novell spent 250 million dollars to buy the companies Ximian and SUSE. When I asked how Novell would combine all those assets into something new and synergistically superior, Stone said he couldn't announce anything yet, but promised something he called a "Linux client environment," something "completely new and different" and "much better than simply substituting Linux desktop systems for Microsoft desktop systems" as Münich did.

Stone also said during his keynote (perhaps in answer to the anticipated questions about Ximian being based on GNOME while SUSE features KDE) that people shouldn't ask "KDE or GNOME?" but rather that, "The money lies in giving each customer what it needs." This might be a Linux-based kiosk for call centers, a PDA environment for mobile users, and so on. Specialization is the path to success.

I thought, as I listened to Stone's keynote, how vendors switching to open source tend to go through stages.

  1. First, a tentative recognition of the historic shift to free software.

  2. Then a phase of loudly announcing over and over (in words attributed to Steve Jobs), "We love open source."

  3. A mingling of their traditional proprietary offerings with open source software they licensed from elsewhere.

  4. A serious commitment to adding value in the open source area. Further stages are likely to emerge, but I haven't seen them yet during the evolution of major vendors.

HP appears to be in the third stage, whereas IBM has reached the fourth. Apple lies between the third and fourth stages, because few people use Darwin on its own (or other software released under a free license by Apple). Sun is the outlier here, having jumped into the fourth stage through its release of OpenOffice.org and JDS, while barely sticking their toes into the second.

Stone's speech reflected the second stage of development, and Novell's offerings the third. They already sell a number of their products on SUSE, and can use them to tie together SUSE with Netware. These products include Novell's directory offering, eDirectory, which offers single signon and other sophisticated identity services, and their clustering filesystem, Novell Storage Services.

While the 250 million dollar expenditure shows the grit in Novell's teeth as it determines to reach the fourth stage, I can't say they've reached it yet. Ximian is still Ximian and SUSE is still SUSE. But Stone is hinting that Novell has a broader vision, and in fact sees the Linux market as broader than most conventional vendors do.

Snips from the discard bin

Brian Behlendorf of Apache likes to see software development as an art as well as a science. In his keynote he decried the approach to development where "software engineers as cogs." He also described some of the government efforts around the world to move from proprietary software to open source software, driven by pressure from U.S. companies to get serious about enforcing licenses, and the resulting new laws that countries have to pass to conform to World Trade Organization regulations. "The WTO is the open source software field's best friend," Behlendorf put it.

Apple Computer faces a challenge that precisely mirrors Linux: having captured hearts and minds as a desktop system, Apple's Macintosh is trying to push its way into heavier applications as a server and a basis for clusters. Dr. Ernest Prabhakar of Apple gave a keynote listing the many levels where Apple uses free software and insisted they try to conform to standards when innovating ("to enhance and open, rather than embrace and extend"). And in classic free software style, Apple includes development tools on every OS X system shipped--and not just standard tools such as gcc, but Apple's finest programming environments--so that every user in theory can be a developer.

Why would Trolltech, the vendors of the cross-platform Qt toolkit, show interest in a conference about a database? While Qt is most famous for building interfaces--particularly as the basis of the KDE desktop--its APIs form an umbrella over a huge range programming activities. Now these include connecting to relational databases. Thus, Qt takes its place next to Perl DBI, JDBC drivers, and other APIs from the many other languages that interface to MySQL. And I suppose this is a benefit to people who want to build interfaces for many different platforms, because they can settle on a unified programming style and expect such conveniences as having data types from different parts of the application conform.

Most of the API is familiar to any programmer who has made a connection to a database, but Trolltech went a bit farther and offered a C++ class that replaces SQL altogether. This was perhaps going too far. SQL syntax is very flat and very frustrating--a legacy of its origin in the 1970s, when language designers expected end-users to type in their queries manually--but it fits the job it has to do. Trying to specify the same activities in C++ syntax is even more awkward and less streamlined. Trenton Schulz of Trolltech told me that many people expressed the same opinion I had, and that the non-SQL interface might be removed.

The annoying but irreplaceable syntax of SQL continued to show its face in MySQL Query Browser, a new graphical tool for viewing and manipulating data from a MySQL database. This tool is in some ways an IDE for writing SQL, complete with such debugging aids as single-stepping and breakpoints. In other ways, the tool is just a convenient way to look at and change data, or compare two results from different queries.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.