ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Learning Python, 2nd Edition

When Pythons Attack
Common Mistakes of Python Programmers

by Mark Lutz, coauthor of Learning Python, 2nd Edition
02/05/2004

In this article, I will chronicle some of the most common mistakes made by both new and veteran Python programmers, to help you avoid them in your own work.

First of all, I should explain that these come straight from first-hand experience. I earn my living as a Python trainer. Over the last seven years, I've had the privilege of teaching over 100 Python classes, to over 1,000 students -- and have watched most of them make the same mistakes. That is, these are things that I've seen real Python beginners do, hundreds of times. In fact, some are so common they are virtually guaranteed to crop up when you are first starting out.

"What's that?" you say. "You can make lots of mistakes in Python, too?" Well, yes. Python may be one of the simplest and most flexible programming languages out there, but it is still a programming language. It still has syntax, datatypes, and the occasional dark corner inhabited by sorcerers named Tim.

The good news is that once you learn Python, many pitfalls are avoided naturally, thanks to the clean design of the language. Python has a minimal set of interactions between its components, which helps reduce bugs. It also has a simple syntax, which means there is less opportunity to make mistakes in the first place. And when you do make a mistake, Python's runtime error detection and reporting helps you recover quickly.

But programming Python still isn't quite an automatic task, and forewarned is forearmed. So without further delay, let's jump into the nitty-gritty. The next three sections group mistakes into pragmatics, coding, and programming at large. If you'd like to read more about common Python mistakes and how to avoid them, all of these and more are described further in the new O'Reilly book, Learning Python, 2nd Edition.

Pragmatic Mistakes

Let's start out with the basics; things that people who are just learning how to program tend to get tripped up on, even before they delve into syntax. If you've already done a bit of programming, most of these may seem very simple; if you've ever tried to teach programming to novices, they probably won't.

Related Reading

Learning Python
By Mark Lutz, David Ascher

Type Python Code at the Interactive Prompt

You can type only Python code, and not system commands, at the >>> interactive prompt. It's not that uncommon to see people enter emacs, ls, or edit commands at this prompt, but they are not Python code. There are ways to run system commands from within Python code (for example, os.system and os.popen), but they are not as direct as simply typing the command itself. If you want to launch a Python file from the interactive prompt, use import file, not the system command python file.py.

Print Statements are Required in Files (Only)

Because the interactive interpreter automatically prints the results of expressions, you do not need to type complete print statements interactively. This is a nice feature, but remember that within a code file, you generally must use print statements to see output.

Beware of Automatic Extensions on Windows

If you use the Notepad program to code program files on Windows, be careful to pick type All Files when it comes time to save your file, and give your file a .py suffix explicitly. Otherwise, Notepad saves your file with a .txt extension, making it difficult to run in some launching schemes. Worse, Word and WordPad add formatting characters by default that are not legal Python syntax. As a rule of thumb, always pick All Files and save as simple text on Windows, or use more programmer-friendly text editors such as IDLE. In IDLE, remember to type .py file extensions manually when saving.

Program-File Icon Click Pitfalls on Windows

On Windows, you can launch a Python program file by clicking on it, but this can be error-prone. First of all, the program's output window disappears as soon as the program finishes; to keep it open, try adding a raw_input() call at the bottom of the file. Also, keep in mind that the output window goes away if there is a program error; to see your error messages, run your program in other ways--from a system command line, by interactive imports, with IDLE menu options, and so on.

Imports Only Work the First Time

You can run a file by importing it at the interactive prompt, but this only works once per session; subsequent imports simply return the already-loaded module. To force Python to reload and rerun a file's code, call the reload(module) function instead. And while you're at it, be sure to use parentheses for reload, but not import.

Blank Lines Matter at the Interactive Prompt (Only)

Blank lines and comment lines are always ignored everywhere in module files, but a blank line ends a compound statement when typing code at the interactive prompt. In other words, a blank line tells the interactive prompt that you've finished a compound statement; don't hit the Enter key on a line by itself until you're really done. Conversely, you really do want to type a blank line to terminate the compound statement interactively, before starting a new statement--the interactive prompt runs one statement at a time.

Coding Mistakes

Once you start writing Python code in earnest, the next batch of pitfalls starts becoming more dangerous -- these are basic coding mistakes that span language features, and often snare the unwitting programmer.

Don't Forget the Colons

This is easily the most common beginner's coding mistake: don't forget to type a : at the end of compound statement headers (the first line of an if, while, for, etc.). You probably will at first anyhow, but it will soon become an unconscious habit. Typically, 75 percent of students in classes have been burned by this one by the end of the day.

Initialize Your Variables

In Python, you cannot use a name within an expression until it has been assigned a value. This is on purpose: it helps to prevent common typo mistakes, and avoids the ambiguous question of what an automatic default should be (0, None, "", [], ?). Remember to initialize counters to 0, list accumulators to [], and so on.

Start in Column 1

Be sure to start top-level, unnested code all the way to the left, in column 1. That includes unnested code typed into module files, as well as unnested code typed at the interactive prompt. Python uses indentation to delimit blocks of nested code, so white space to the left of your code means a nested block. White space is generally ignored everywhere, except for indentation.

Indent Consistently

Avoid mixing tabs and spaces in the indentation of a given single block, unless you know what every system that touches your code may do with tabs. Otherwise, what you see in your editor may not be what Python sees when it counts tabs as a number of spaces. It's safer to use all tabs or all spaces for each block; how many is up to you.

Always Use Parentheses to Call a Function

You must add parentheses after a function name to call it, whether it takes arguments or not. That is, use function(), not function. Python functions are simply objects that have a special operation, a call, that you trigger with the parentheses. Like all objects, they can also be assigned to variables, and used indirectly: x = function; x().

In Python training, this seems to occur most often with files. It's common to see beginners type file.close to close a file, rather than file.close(); because it's legal to reference a function without calling it, the first version without parenthesis succeeds silently, but does not close the file!

Don't Use Extensions or Paths in Imports

Use directory paths and file extensions in system command lines (e.g., python dir/mod.py), but not in import statements. That is, say import mod, not import mod.py or import dir/mod.py. In practice, this is probably the second most common beginner mistake. Because modules may have other suffixes besides .py (.pyc, for instance), hardcoding a particular suffix is not only illegal syntax, it doesn't make sense.

Platform-specific directory-path syntax comes from your module search path settings, not the import statement. You can use dots in filenames to refer to package subdirectories (e.g., import dir1.dir2.mod), but the leftmost directory still must be found via the module search path, and no other path syntax can appear in imports. The incorrect statement import mod.py is assumed by Python to be a package import--it imports the module mod, and then tries to find a module named py within a directory named mod, and winds up generating a potentially confusing error message.

Don't Code C in Python

A few reminders for C/C++ programmers new to Python:

Programming Mistakes

Finally, here are some of the problems you may come across when you start working with the larger features of the Python language -- datatypes, functions, modules, classes, and the like. Because of space constraints, this section is abbreviated, especially with respect to advanced programming concepts; for the rest of the story, see the tips and "gotchas" sections of Learning Python, 2nd Edition.

File-Open Calls Do Not Use the Module Search Path

When you use the open() call in Python to access an external file, Python does not use the module search path to locate the target file. It uses an absolute path you give, or assumes the filename is relative to the current working directory. The module search path is consulted only for module imports.

Methods Are Specific to Types

You can't use list methods on strings, and vice versa. In general, methods calls are type- specific, but built-in functions may work on many types. For instance, the list reverse method only works on lists, but the len function works on any object with a length.

Immutable Types Can't Be Changed in Place

Remember that you can't change an immutable object (e.g., tuple, string) in place:

T = (1, 2, 3)
T[2] = 4          # Error

Construct a new object with slicing, concatenation, and so on, and assign it back to the original variable if needed. Because Python automatically reclaims unused memory, this is not as wasteful as it may seem:

T = T[:2] + (4,)  # Okay: T becomes (1, 2, 4)

Use Simple for Loops Instead of while or range

When you need to step over all items in a sequence object from left to right, a simple for loop (e.g., for x in seq:) is simpler to code, and usually quicker to run, than a while- or range-based counter loop. Avoid the temptation to use range in a for unless you really have to; let Python handle the indexing for you. All three of the following loops work, but the first is usually better; in Python, simple is good.

S = "lumberjack"

for c in S: print c                   # simplest

for i in range(len(S)): print S[i]    # too much

i = 0                                 # too much
while i < len(S): print S[i]; i += 1

Don't Expect Results From Functions That Change Objects

In-place change operations such as the list.append( ) and list.sort( ) methods modify an object, but do not return the object that was modified (they return None); call them without assigning the result. It's not uncommon for beginners to say something like:

mylist = mylist.append(X)

to try to get the result of an append; instead, this assigns mylist to None, rather than the modified list. A more devious example of this pops up when trying to step through dictionary items in sorted-key fashion:

D = {...}
for k in D.keys().sort(): print D[k]

This almost works -- the keys method builds a keys list, and the sort method orders it -- but since the sort method returns None, the loop fails because it is ultimately a loop over None (a nonsequence). To code this correctly, split the method calls out into statements:

Ks = D.keys()
Ks.sort()
for k in Ks: print D[k]

Conversions Only Happen Among Number Types

In Python, an expression like 123 + 3.145 works -- it automatically converts the integer to a floating point, and uses floating point math. On the other hand, the following fails:

S = "42"
I = 1
X = S + I        # A type error

This is also on purpose, because it is ambiguous: should the string be converted to a number (for addition), or the number to a string (for concatenation)?. In Python, we say that explicit is better than implicit (that is, EIBTI), so you must convert manually:

X = int(S) + I   # Do addition: 43
X = S + str(I)   # Do concatenation: "421" 

Cyclic Datastructures Can Cause Loops

Although fairly rare in practice, if a collection object contains a reference to itself, it's called a cyclic object. Python prints a [...] whenever it detects a cycle in the object, rather than getting stuck in an infinite loop:

>>> L = ['grail']  # Append reference back to L
>>> L.append(L)    # Generates cycle in object
>>> L
['grail', [...]]

Besides understanding that the three dots represent a cycle in the object, this case is worth knowing about because cyclic structures may cause code of your own to fall into unexpected loops if you don't anticipate them. If needed, keep a list or dictionary of items already visited, and check it to know if you have reached a cycle.

Assignment Creates References, Not Copies

This is a core Python concept, which can cause problems when its behavior isn't expected. In the following example, the list object assigned to the name L is referenced both from L and from inside of the list assigned to name M. Changing L in place changes what M references, too, because there are two references to the same object:

>>> L = [1, 2, 3]        # A shared list object
>>> M = ['X', L, 'Y']    # Embed a reference to L
>>> M
['X', [1, 2, 3], 'Y']

>>> L[1] = 0             # Changes M too
>>> M
['X', [1, 0, 3], 'Y']

This effect usually becomes important only in larger programs, and shared references are normally exactly what you want. If they're not, you can avoid sharing objects by copying them explicitly; for lists, you can make a top-level copy by using an empty-limits slice:

>>> L = [1, 2, 3]
>>> M = ['X', L[:], 'Y']   # Embed a copy of L

>>> L[1] = 0               # Change only L, not M
>>> L
[1, 0, 3]
>>> M
['X', [1, 2, 3], 'Y']

Slice limits default to 0 and the length of the sequence being sliced. If both are omitted, the slice extracts every item in the sequence, and so makes a top-level copy (a new, unshared object). For dictionaries, use the dict.copy() method.

Local Names Are Detected Statically

Python classifies names assigned in a function as locals by default; they live in the function's scope and exist only while the function is running. Technically, Python detects locals statically, when it compiles the defs code, rather than by noticing assignments as they happen at runtime. This can also lead to confusion if it's not understood. For example, watch what happens if you add an assignment to a variable after a reference:

>>> X = 99
>>> def func():
...     print X      # Does not yet exist
...     X = 88       # Makes X local in entire def
... 
>>> func( )          # Error!

You get an undefined name error, but the reason is subtle. While compiling this code, Python sees the assignment to X and decides that X will be a local name everywhere in the function. But later, when the function is actually run, the assignment hasn't yet happened when the print executes, so Python raises an undefined name error.

Really, the previous example is ambiguous: did you mean to print the global X and then create a local X, or is this a genuine programming error? If you really mean to print global X, you need to declare it in a global statement, or reference it through the enclosing module name.

Defaults and Mutable Objects

Default argument values are evaluated and saved once, when the def statement is run, not each time the function is called. That's usually what you want, but since defaults retain the same object between calls, you have to be mindful about changing mutable defaults. For instance, the following function uses an empty list as a default value and then changes it in place each time the function is called:

>>> def saver(x=[]):   # Saves away a list object
...     x.append(1)    # and changes it each time
...     print x
...
>>> saver([2])         # Default not used
[2, 1]
>>> saver()            # Default used
[1]
>>> saver()            # Grows on each call!
[1, 1]
>>> saver()
[1, 1, 1]

Some see this behavior as a feature -- because mutable default arguments retain their state between function calls, they can serve some of the same roles as static local function variables in the C language. However, this can seem odd the first time you run into it, and there are simpler ways to retain state between calls in Python (e.g., classes).

To avoid this behavior, make copies of the default at the start of the function body with slices or methods, or move the default value expression into the function body; as long as the value resides in code that runs each time the function is called, you'll get a new object each time:

>>> def saver(x=None):
...     if x is None: x = []   # No arg passed?
...     x.append(1)            # Changes new list
...     print x
...
>>> saver([2])                 # Default not used
[2, 1]
>>> saver()                    # Doesn't grow now
[1]
>>> saver()
[1]

Other Common Programming Traps

Here's a quick survey of other pitfalls we don't have space to cover in detail:

Mark Lutz is the world leader in Python training, the author of Python's earliest and best-selling texts, and a pioneering figure in the Python community since 1992.


O'Reilly & Associates recently (in December 2003) released Learning Python, 2nd Edition.


Return to the Python DevCenter.


Copyright © 2009 O'Reilly Media, Inc.