Published on (
 See this if you're having trouble printing code examples

An Introduction to Haskell, Part 1: Why Haskell

by Adam Turoff

Let me start by being perfectly clear: if you are a professional programmer, then Haskell is in your future.

In 1987, this statement would have been equally true about Smalltalk. Today, 20 years later, object-oriented programming, class hierarchies, model-view-controller patterns, and other ideas from Smalltalk are now commonplace, even though Smalltalk never saw widespread adoption.

Haskell is in a similar situation today. It is an esoteric language, yet stable and reliable enough for serious work. Perhaps you will learn Haskell, or even use Haskell on a real-world project. Even if you don't, ideas from Haskell are slowly finding their way into mainstream languages like Java, C#, C++, Perl, Python, and Ruby. Haskell is poised to have a strong influence over the next 20 years of programming and language design, just as Smalltalk has had an influence during the last 20 years.

What Is Haskell?

Haskell sounds important, but what makes it so special? The brief definition of Haskell from says:

Haskell is a general purpose, purely functional programming language featuring static typing, higher order functions, polymorphism, typeclasses and monadic effects.

If you are an academic or a language designer, these terms may be meaningful. If you are a professional programmer, terms like static typing and polymorphism may sound familiar, even if this definition isn't particularly elucidating.

Fundamentally, Haskell is a functional programming language, which means it is based on a form of lambda calculus. If this sounds intimidating, then fear not, because lambda calculus is not difficult at all.

Lambda calculus is nothing more than a precise set of rules that describe how to evaluate functions. There are two moving parts (function definition and function evaluation), and two building blocks (functions and values). If you're a professional programmer, you already have all the skills you need to understand lambda calculus. For example, in any C-like language, the rule for evaluating this statement is already wired deep into your consciousness:

    printf("%.03f\n", sqrt(incr(1)));

The rule is to evaluate all the arguments to a function first, then evaluate the function with the resulting values. This process is repeated recursively, until an expression yields a single value. In the example above, the innermost function, incr(1) is evaluated to produce the value 2. Next, sqrt(2) is evaluated to produce 1.414.... Finally, printf is evaluated with the two arguments "%.03f\n" and 1.414... to produce some formatted output.

There are a few more key features to lambda calculus, like the ability to create functions that return functions, and the ability to define functions that take functions as arguments. These features make it possible to define small, modular functions, and glue them together into highly specialized functions with minimal effort.

For example, filter is a general purpose function that accepts a testing function (a predicate), a list of values, and returns a new list containing all values where the test passes. The even function is one such test, which determines whether or not a number is divisible by 2. Here is how those two functions are used together in Haskell, to extract the even numbers from a list of values. This example uses ghci, the interactive interpreter provided with the Glorious Glasgow Haskell Compiler (ghc):

    Prelude> filter even [1..10]  -- filter the even numbers out of a list

If I want to define a function that takes a list of values, and returns only the even values in that list, it would look like this in the interpreter:

    Prelude> let evens = filter even  -- define the function
    Prelude> evens [1..10]            -- filter numbers out of a list

This example may appear trivial, but it demonstrates a key benefit of working with lambda calculus in a functional programming language--the ability to create specialized functions by merely gluing together a small number of more generalized, pre-existing functions.

Interestingly, this is exactly how shell pipelines work.

Writing the function evens in C is much more complex. First, the output list must be allocated, and to do that, the number of output values must be counted. Next, the values must be copied to the new list, one at a time. This simple expression in Haskell becomes this function in C:

    int *evens(int *in, int count) {
        int matches = 0;
        int i, *out, *ptr;
        for (i = 0; i < count; i++) {
            if ((in[i] % 2) == 0) {
        out = malloc(sizeof(int) * matches);
        ptr = out;
        for (i = 0; i < count; i++) {
            if ((in[i] % 2) == 0) {
                *ptr = in[i];
        return out;

This function defines the combined behavior of the Haskell definition of evens. That is, this block of code is equivalent to the expression filter even in Haskell. If there were a C library function that handled the behavior of filter, this code would be simpler. However, this is kind of solution is sadly idiomatic in C-like languages.

Writing this function in a more modern language like Java, C++ or C# isn't as odious, because automatic memory management takes care of the first half of this function. Writing the expression 'filter even [1..10]' in dynamic languages like Perl, Python and Ruby are closer to the Haskell definition, because they define primitive functions that act like filter. Here is what this expression looks like in those languages:

    ## Perl
    grep {$_ % 2 == 0} (1..10)
    ## Python
    filter(lambda n: n % 2 == 0, [1,2,3,4,5,6,7,8,9,10])

    ## Ruby
    [1,2,3,4,5,6,7,8,9,10].delete_if {|i| i % 2 == 1}

Why Lambda Calculus?

Most mainstream programming languages focus on managing a machine, whether it is a real computer or a virtual machine. That is, most languages have an imperative nature, where the programmer must tell the machine what to do in a step-by-step fashion. Understanding a program usually means reading through a program one statement at a time and asking, "what will the computer do next?"

Functional programming languages use lambda calculus to take a different approach. Instead of telling the computer how to solve a problem one statement at a time, they allow a programmer to describe what the solution is, and let the computer figure out how to execute it. For example, it is easier to produce a new list by using a function like filter to select the elements you want to keep (or, in the Ruby example, removing the elements you want to ignore), than it is to write a function in C that must first allocate space for a result, then place each selected value in the result list.

Of course, it's not necessary to use a functional programming language to use these techniques. Because ideas from the functional programming world are appearing in mainstream languages, it is more important than ever to understand these techniques. Tom Christiansen said it best:

A programmer who hasn't been exposed to all four of the imperative, functional, objective, and logical programming styles has one or more conceptual blindspots. It's like knowing how to boil but not fry. Programming is not a skill one develops in five easy lessons.

Many programming languages offer a mixture of styles. Most object oriented languages have an imperative core, where classes, objects and methods provide a thin veneer over a language that is little more than a slightly improved version of C. Many functional programming languages mix functional, imperative, and object-oriented styles together in a manner that makes it difficult to tell them apart.

Haskell, on the other hand, is a purely functional language that restricts itself to the functional style of programming. Learning and using Haskell makes it easy to see the power and benefits of lambda calculus and functional programming.

The Language Divide

Tom's observation describes four cornerstones of computer science: imperative programming, object-oriented programming, functional programming, and logic-oriented programming. These four families can be grouped into two styles: mechanical (imperative and object-oriented) and mathematical (functional and logic-oriented).

"Mechanical" languages focus on expressing a program as a series of steps to execute, one after another. Object-oriented programming is more powerful than simple imperative programming, because it defines an additional series of rules that let the compiler or interpreter do more of the drudge work to compose a program.

"Mathematical" languages start with a theoretical model of how to solve a problem, and a concrete implementation of that model. Four examples of this approach are regular expressions, SQL, make and, of course, Haskell.

Regular expressions use automata theory to describe a mechanism for matching strings. It is much easier to write a regular expression like '^(NYC?|New York( City)?)$' than it is to write code that matches the four strings "NY", "NYC", "New York" and "New York City". Furthermore, a good regular expression library can always use an optimal search, scanning each character only once, while hand-coding an optimal string search each an every time a string must be matched is far more difficult.

SQL uses set theory and relational algebra to search for data in a database. A single statement in SQL can refer to many tables, many constraints, and many calculated expressions. A typical relational database engine uses a query optimizer to determine if any of the necessary data are indexed, and if so, refers to the index to reduce the amount of work needed to return a result. The alternative to using SQL and a relational database is to write pages of code to perform all of the I/O operations to manage data tables (and their indexes) on disk. This might be feasible for very simple SQL statements, but it is certainly more tedious. Complex SQL statements are still somewhat easy to write and understand, while the alternative can easily become pages of hard-to-write, error-prone code that is difficult to understand or optimize.

Make and related tools like ant, nant and rake, use a series of recipes to determine how to build a software project. Internally, make reads these recipes and interprets them using a little bit of graph theory to model the dependencies described by those recipes. When make is presented with a desired goal like make test, make docs, make all or make debug, it traces the dependency graph from that goal back to a set of dependencies that need to be rebuilt, and proceeds to build only what is absolutely necessary. This process is much more maintainable and reliable than a hand-crafted shell script to perform the same work. Additionally, make can be run in a multiprocessing mode, where multiple independent jobs can run concurrently to speed up the build process, something that is nearly impossible to achieve with hand-coded build scripts.

Haskell also falls into this category of programming languages, and lambda calculus is its foundation. make, SQL and regular expressions all describe limited purpose languages, fit for a solving a specific kind of problem within the context of a larger project. Haskell is a general purpose language, and it brings the same kind of heavy lifting into the design and construction of applications and large systems.

Haskell and the Future of Programming

Software development today is at an inflection point. Until recently, almost all software was written to run on a single machine with a single processor. Today, more and more software must be written to accommodate multiprocessing, whether it involves using multiple CPU cores, multiple nodes on a network, or both. Yet many of the languages and environments used today work best when used on a single machine with a single processor.

It's an open question how we as developers should build software to take advantage of the ever increasing amounts of compute power available. Three of the most interesting ideas are available now, or will soon be available in ghc.

One persistent problem in multiprocessing programs is managing global shared state. Haskell avoids these issues by making all values immutable. In other words, variables in Haskell don't vary. Whenever a function returns a result, it creates a new value, and lets the garbage collector handle memory allocation and destruction. A function's result is sent back to its caller, and as a result, there is no such thing as global state that can modify the behavior of a program. Programs that require stateful behavior do exist in Haskell, but they are strictly quarantined and can be handled separately from the rest of the program.

Haskell programs use a "shared-nothing" style naturally. This is the same architecture that is considered a best practice for writing web applications that need to scale up to use multiple CPUs, or scale across to support replication across multiple networked nodes.

Another problem in multiprocessing is resource contention, where two threads interfere with each other as they try to update the same shared resource. One solution involves using semaphores and locks in a strictly coordinated manner. Unfortunately, dealing with threads and locks is very tricky, and very easy to get wrong. Another approach is to use atomic transactions, where each writer succeeds and writes all of its changes, or fails and writes none of them.

Database engines have used transactions for decades to make multi-user updates sane. Until recently, this familiar mechanism was not available to software developers when writing software outside the scope of a database. This model is available today in ghc, through software transactional memory, or STM. Libraries supporting STM are slowly appearing for other languages as well.

A third problem deals with using multiple CPUs efficiently on a single machine. Some of these "embarrassingly parallel" problems are easy to write using a set of sequential instructions to execute on a single CPU. Writing code that distributes the work across multiple CPUs on a single machine is considerably more difficult.

There is work in progress to remedy this situation in ghc. The Data Parallel Haskell extension integrates a form of parallel computation into Haskell programs. This extension allows the programmer to specify when arrays can benefit from parallelized operations that are spread across multiple CPUs within a single machine. This allows programmers to write a normal straightforward Haskell program, and identify which parts are "embarrassingly parallel," and that should be parallelized transparently. Data Parallel Haskell is a work in progress, and an area of active research by the ghc development team.

Clearly, Haskell is an important language. It deserves your attention.

The next parts in this series will discuss two distinctive aspects of Haskell: functions and monads.

Articles in This Series

Adam Turoff is a software developer with more than a dozen years of experience in building web backend systems with open source tools. His current project uses a mixture of Perl, Ruby, Rails, Haskell, and other tools.

Return to

Copyright © 2009 O'Reilly Media, Inc.