Python DevCenter
oreilly.comSafari Books Online.Conferences.


Beginning Python for Bioinformatics
Pages: 1, 2, 3, 4, 5

Python Dictionaries

A Python dictionary has the same benefit as a regular paper dictionary. It allows you to quickly locate the value (definition) associated with a key (word). Dictionaries are denoted by curly braces and contain a comma-separated sequence of key:value pairs. Dictionaries are not ordered. Instead, dictionary values are accessed by their key value, rather than their position in the sequence. Let's look at some of the methods supported by dictionaries.

>>> basecomplement = {'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'} 
>>> basecomplement.keys() 
['A', 'C', 'T', 'G'] 
>>> basecomplement.values() 
['T', 'G', 'A', 'C'] 
>>> basecomplement['A'] 
>>> basecomplement['C'] 
>>> for base in basecomplement.keys(): 
...     print "The complement of", base, "is", basecomplement[base] 
The complement of A is T 
The complement of C is G 
The complement of T is A 
The complement of G is C 
>>> for base in basecomplement: 
...     print "The complement of", base, "is", basecomplement[base] 
The complement of A is T 
The complement of C is G 
The complement of T is A 
The complement of G is C 

In this example we also introduced the concept of a for loop, which cycles over the keys of the basecomplement dictionary. Python's for loop can iterate over any sequence. In this example it assigns the first value from the list returned by keys() to the variable named base, executes the print statement, then repeats the process for each subsequent value in the list. In the second for loop example, you can see that when we simply specify "for base in basecomplement" Python defaults to looping over the basecomplement dictionary's keys.

More User-defined Functions

The next example will demonstrate one other technique we will need in our complement function. It's a relatively new feature of Python, called list comprehensions.

>>> letters = list('CCGGAAGAGCTTACTTAG') 
>>> [basecomplement[base] for base in letters] 
['G', 'G', 'C', 'C', 'T', 'T', 'C', 'T', 'C',  
'G', 'A', 'A', 'T', 'G', 'A', 'A', 'T', 'C'] 

A list comprehension returns a list and works similarly to a for loop, but in a much more compact and efficient format. In this case it allows us to return a new list where each base in the original list of letters has been replaced with its complement, which we retrieved from the basecomplement dictionary. Let's see how we put this all together.

>>> def complement(s): 
...     """Return the complementary sequence string.""" 
...     basecomplement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
...     letters = list(s) 
...     letters = [basecomplement[base] for base in letters] 
...     return ''.join(letters) 
>>> complement('CCGGAAGAGCTTACTTAG') 

Now that we've got a reverse function and a complement function, we have the building blocks for a reversecomplement function.

>>> def reversecomplement(s): 
...     """Return the reverse complement of the dna string.""" 
...     s = reverse(s) 
...     s = complement(s) 
...     return s 
>>> reversecomplement('CCGGAAGAGCTTACTTAG') 

It can also be useful to know the percentage of DNA composed of G and C bases. String objects have a count() method that returns the number of character occurences. With that information, calculating the percentage is a simple matter of applying some mathematical calculations.

>>> def gc(s): 
...     """Return the percentage of dna composed of G+C.""" 
...     gc = s.count('G') + s.count('C') 
...     return gc * 100.0 / len(s) 

Since DNA can be divided into three character segments (codons), a function that returned a list of codons would also be useful. Another simple mathematical calculation determines the ending point for our codons in case the DNA string is not evenly divisible by three. The range() function returns a list of numbers from a beginning point to an ending point, incrementing by some value, in this case 3. This arithmetic progression is used inside a list comprehension combined with string slicing to produce a list of three character strings.

>>> def codons(s): 
...     """Return list of codons for the dna string.""" 
...     end = len(s) - (len(s) % 3) - 1 
...     codons = [s[i:i+3] for i in range(0, end, 3)] 
...     return codons 
['CCG', 'GAA', 'GAG', 'CTT', 'ACT', 'TAG'] 

String slicing is similar to string indexing. Instead of retrieving a single character, string slicing allows us to retrieve sections of characters from a starting position up to, but not including, an ending position. The syntax is s[i:j], where i is the starting position and j is the ending position. So s[0:3] returns a string containing the characters in index positions 0, 1, and 2.

>>> s[0:3] 
>>> s[3:6] 
>>> s[6:9] 
>>> s[9:12] 

Here is one final, interesting, note about functions. Functions themselves are objects. That means we can examine their attributes using dir(), just like we did for strings and lists. One of the more useful attributes of a function object is its documentation string, which gets stored in its __doc__ property.

>>> dir(transcribe) 
['__call__', '__class__', '__delattr__', '__dict__', '__doc__',  
'__get__', '__getattribute__', '__hash__', '__init__', '__name__',  
'__new__', '__reduce__', '__repr__', '__setattr__', '__str__',  
'func_closure', 'func_code', 'func_defaults', 'func_dict',  
'func_doc', 'func_globals', 'func_name'] 
>>> transcribe.__doc__ 
'Return dna string as rna string.' 

Don't worry if this last example is a bit esoteric. The main point of showing it was to emphasize that Python is very powerful and consistent, that everything in Python is an object, and that objects can be inspected on the fly. The result is that as you learn Python you will find that unfamiliar objects often behave exactly as you would expect them to behave the very first time you use them. This is a powerful feeling that's not experienced often enough when using other programming languages.

We've seen how to create simple objects, like strings, lists, dictionaries, and functions. Next we're going to look at how we can create our own custom objects with properties and methods that we define.

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: