Editor's note: If you missed recipes from part one of this twopart series of excerpts from Python Cookbook, 2nd Edition, then you missed how to handle international text with Unicode, and how to select elements from an unsorted sequence. For all 200plus recipes, check out the latest edition.
Credit: Sébastien Keim, Paul Moore, Steve Alexander, Raymond Hettinger
You want to define a buffer with a fixed size, so that, when it fills up, adding another element overwrites the first (oldest) one. This kind of data structure is particularly useful for storing log and history information.
This recipe changes the buffer object's class on the fly, from a nonfull buffer class to a full buffer class, when the buffer fills up:
class RingBuffer(object):
""" class that implements a notyetfull buffer """
def _ _init_ _(self, size_max):
self.max = size_max
self.data = [ ]
class _ _Full(object):
""" class that implements a full buffer """
def append(self, x):
""" Append an element overwriting the oldest one. """
self.data[self.cur] = x
self.cur = (self.cur+1) % self.max
def tolist(self):
""" return list of elements in correct order. """
return self.data[self.cur:] + self.data[:self.cur]
def append(self, x):
""" append an element at the end of the buffer. """
self.data.append(x)
if len(self.data) == self.max:
self.cur = 0
# Permanently change self's class from nonfull to full
self._ _class_ _ = _ _Full
def tolist(self):
""" Return a list of elements from the oldest to the newest. """
return self.data
# sample usage
if _ _name_ _ == '_ _main_ _':
x = RingBuffer(5)
x.append(1); x.append(2); x.append(3); x.append(4)
print x._ _class_ _, x.tolist( )
x.append(5)
print x._ _class_ _, x.tolist( )
x.append(6)
print x.data, x.tolist( )
x.append(7); x.append(8); x.append(9); x.append(10)
print x.data, x.tolist( )
Related Reading Python Cookbook 
A ring buffer is a buffer with a fixed size. When it fills up, adding another element overwrites the oldest one that was still being kept. It's particularly useful for the storage of log and history information. Python has no direct support for this kind of structure, but it's easy to construct one. The implementation in this recipe is optimized for element insertion.
The notable design choice in the implementation is that, since these
objects undergo a nonreversible state transition at some point in
their lifetimes—from nonfull buffer to full buffer (and
behavior changes at that point)—I modeled that by changing
self._ _class_ _
. This works just as well for
classic classes as for newstyle ones, as long as the old and new
classes of the object have the same slots (e.g., it works fine for
two newstyle classes that have no slots at all, such as
RingBuffer and _ _Full in this
recipe). Note that, differently from other languages, the fact that
class _ _Full is implemented inside class
RingBuffer does not imply any special relationship
between these classes; that's a good thing, too,
because no such relationship is necessary.
Changing the class of an instance may be strange in many languages, but it is an excellent Pythonic alternative to other ways of representing occasional, massive, irreversible, and discrete changes of state that vastly affect behavior, as in this recipe. Fortunately, Python supports it for all kinds of classes.
Ring buffers (i.e., bounded queues, and
other names) are quite a useful idea, but the inefficiency of testing
whether the ring is full, and if so, doing something different, is a
nuisance. The nuisance is particularly undesirable in a language like
Python, where there's no difficulty—other than
the massive memory cost involved—in allowing the list to grow
without bounds. So, ring buffers end up being underused in spite of
their potential. The idea of assigning to _ _class_
_
to switch behaviors when the ring gets full is the key to
this recipe's efficiency: such class switching is a
oneoff operation, so it doesn't make the
steadystate cases any less efficient.
Alternatively, we might switch just two methods, rather than the whole class, of a ring buffer instance that becomes full:
class RingBuffer(object):
def _ _init_ _(self,size_max):
self.max = size_max
self.data = [ ]
def _full_append(self, x):
self.data[self.cur] = x
self.cur = (self.cur+1) % self.max
def _full_get(self):
return self.data[self.cur:]+self.data[:self.cur]
def append(self, x):
self.data.append(x)
if len(self.data) == self.max:
self.cur = 0
# Permanently change self's methods from nonfull to full
self.append = self._full_appendself.tolist = self._full_get
def tolist(self):
return self.data
This methodswitching approach is essentially equivalent to the classswitching one in the recipe's solution, albeit through rather different mechanisms. The best approach is probably to use class switching when all methods must be switched in bulk and method switching only when you need finer granularity of behavior change. Class switching is the only approach that works if you need to switch any special methods in a newstyle class, since intrinsic lookup of special methods during various operations happens on the class, not on the instance (classic classes differ from newstyle ones in this aspect).
You can use many other ways to implement a ring buffer. In Python
2.4, in particular, you should consider subclassing the new type
collections.deque
, which supplies a
"doubleended queue", allowing
equally effective additions and deletions from either
end:
from collections import deque
class RingBuffer(deque):
def _ _init_ _(self, size_max):
deque._ _init_ _(self)
self.size_max = size_max
def append(self, datum):
deque.append(self, datum)
if len(self) > self.size_max:
self.popleft( )
def tolist(self):
return list(self)
or, to avoid the if
statement when at steady
state, you can mix this idea with the idea of switching a method:
from collections import deque
class RingBuffer(deque):
def _ _init_ _(self, size_max):
deque._ _init_ _(self)
self.size_max = size_max
def _full_append(self, datum):
deque.append(self, datum)
self.popleft( )
def append(self, datum):
deque.append(self, datum)
if len(self) == self.size_max:
self.append = self._full_append
def tolist(self):
return list(self)
With this latest implementation, we need to switch only the
append
method (the tolist method
remains the same), so method switching appears to be more appropriate
than class switching.
The Reference Manual and Python in
a Nutshell sections on the standard type hierarchy and
classic and newstyle object models; Python 2.4 Library
Reference on module
collections
.

Credit: David Eppstein, Tim Peters, Alex Martelli, Wim Stolker, Kazuo Moriwaka, Hallvard Furuseth, Pierre Denis, Tobias Klausmann, David Lees, Raymond Hettinger
You need to compute an unbounded sequence of all primes, or the list of all primes that are less than a certain threshold.
To compute an unbounded sequence, a generator is the natural Pythonic approach, and the Sieve of Eratosthenes, using a dictionary as the supporting data structure, is the natural algorithm to use:
import itertools
def eratosthenes( ):
'''Yields the sequence of prime numbers via the Sieve of Eratosthenes.'''
D = { } # map each composite integer to its firstfound prime factor
for q in itertools.count(2): # q gets 2, 3, 4, 5, ... ad infinitum
p = D.pop(q, None)
if p is None:
# q not a key in D, so q is prime, therefore, yield it
yield q
# mark q squared as notprime (with q as firstfound prime factor)
D[q*q] = q
else:
# let x < smallest (N*p)+q which wasn't yet known to be composite
# we just learned x is composite, with p firstfound prime factor,
# since p is the firstfound prime factor of q  find and mark it
x = p + q
while x in D:
x += p
D[x] = p
To compute all primes up to a predefined threshold, rather than an unbounded sequence, it's reasonable to wonder if it's possible to use a faster way than good old Eratosthenes, even in the smart variant shown as the "Solution". Here is a function that uses a few speedfavoring tricks, such as a hardcoded tuple of the first few primes:
def primes_less_than(N):
# make `primes' a list of known primes < N
primes = [x for x in (2, 3, 5, 7, 11, 13) if x < N]
if N <= 17: return primes
# candidate primes are all odd numbers less than N and over 15,
# not divisible by the first few known primes, in descending order
candidates = [x for x in xrange((N2)1, 15, 2)
if x % 3 and x % 5 and x % 7 and x % 11 and x % 13]
# make `top' the biggest number that we must check for compositeness
top = int(N ** 0.5)
while (top+1)*(top+1) <= N:
top += 1
# main loop, weeding out nonprimes among the remaining candidates
while True:
# get the smallest candidate: it must be a prime
p = candidates.pop( )
primes.append(p)
if p > top:
break
# remove all candidates which are divisible by the newfound prime
candidates = filter(p._ _rmod_ _, candidates)
# all remaining candidates are prime, add them (in ascending order)
candidates.reverse( )
primes.extend(candidates)
return primes
On a typical small task such as looping over all primes up to 8,192,
eratosthenes
(on an oldish 1.2 GHz Athlon PC, with
Python 2.4) takes 22 milliseconds, while
primes_less_than
takes 9.7; so, the slight
trickery and limitations of primes_less_than
can
pay for themselves quite handsomely if generating such primes is a
bottleneck in your program. Be aware, however, that
eratosthenes
scales better. If you need all primes
up to 199,999, eratosthenes
will deliver them in
0.88 seconds, while primes_less_than
takes 0.65.
Since primes_less_than
's little
speedup tricks can help, it's natural to wonder
whether a perhaps simpler trick can be retrofitted into
eratosthenes
as well. For example, we might simply
avoid wasting work on a lot of even numbers,
concentrating on odd numbers only, beyond the initial
2
. In other words:
def erat2( ):
D = { }
yield 2
for q in itertools.islice(itertools.count(3), 0, None, 2):
p = D.pop(q, None)
if p is None:
D[q*q] = q
yield q
else:
x = p + q
while x in D or not (x&1):
x += p
D[x] = p
And indeed, erat2
takes 16 milliseconds, versus
eratosthenes
' 22, to get primes
up to 8,192; 0.49 seconds, versus
eratosthenes
' 0.88, to get primes
up to 199,999. In other words, erat2
scales just
as well as eratosthenes
and is always
approximately 25% faster. Incidentally, if you're
wondering whether it might be even faster to program at a slightly
lower level, with q = 3
to start, a while
True
as the loop header, and a q += 2
at
the end of the loop, don't worry—the slightly
higherlevel approach using
itertools
'
count
and islice
functions is
repeatedly approximately 4% faster. Other languages may impose a
performance penalty for programming with higher abstraction, Python
rewards you for doing that.
You might keep pushing the same idea yet further, avoiding multiples
of 3
as well as even numbers, and so on. However,
it would be an exercise in diminishing returns: greater and greater
complication for smaller and smaller gain. It's
better to quit while we're ahead!
If you're into one liners, you might want to study the following:
def primes_oneliner(N):
aux = { }
return [aux.setdefault(p, p) for p in range(2, N)
if 0 not in [p%d for d in aux if p>=d+d]]
Be aware that one liners, even clever ones, are generally anything
but speed demons! primes_oneliner takes 2.9 seconds
to complete the same small task (computing primes up to 8,192) which,
eratosthenes
does in 22 milliseconds, and
primes_less_than
in 9.7so,
you're slowing things down by 130 to 300 times just
for the fun of using a clever, opaque one liner, which is clearly not
a sensible tradeoff. Clever one liners can be instructive but should
almost never be used in production code, not just because
they're terse and make maintenance harder than
straightforward coding (which is by far the main reason), but also
because of the speed penalties they may entail.
While prime numbers, and number theory more generally, used to be considered purely theoretical problems, nowadays they have plenty of practical applications, starting with cryptography.
To explore both number theory and its applications, the best book is probably Kenneth Rosen, Elementary Number Theory and Its Applications (AddisonWesley); http://www.utm.edu/research/primes/ for more information about prime numbers.
Related Reading Python Cookbook 
Return to Python DevCenter.
Copyright © 2009 O'Reilly Media, Inc.