Beginning Python for Bioinformatics
Pages: 1, 2, 3, 4, 5
Python Lists
Where Python strings are limited to characters, Python lists have no limitations. Python lists are ordered sequences of arbitrary Python objects, including other lists. In addition, you can insert, delete and replace elements in a list. Lists are written as a series of objects, separated by commas, inside of square brackets. Let's look at some lists, and some operations you can perform on lists.
>>> bases = ['A', 'C', 'G', 'T']
>>> bases
['A', 'C', 'G', 'T']
>>> bases.append('U')
>>> bases
['A', 'C', 'G', 'T', 'U']
>>> bases.reverse()
>>> bases
['U', 'T', 'G', 'C', 'A']
>>> bases[0]
'U'
>>> bases[1]
'T'
>>> bases.remove('U')
>>> bases
['T', 'G', 'C', 'A']
>>> bases.sort()
>>> bases
['A', 'C', 'G', 'T']
In this example we created a list of single characters that we called bases.
Then we added an element to the end, reversed the order of all the elements,
retrieved elements by their index position, removed an element with the value
'U', and sorted the elements. Removing an element from a list
illustrates a situation where we need to supply the remove()
method with an additional piece of information, namely the value that we want
to remove from the list. As you can see in the picture below, PyCrust takes
advantage of Python's ability to let us know what is required for most
operations by displaying that information in a call tip pop-up window.

A tooltip showing usage of the 'remove' method.
We've talked about objects having methods, such as the remove() method of a
list object, and how a method performs a task and, perhaps, returns a result.
Python has another very similar feature, called a function. About the only
difference between a function and a method is that a function isn't associated
with a particular object.
Note: Whether something should be defined as a function or a method is, in part, a design choice. In fact, we're going to create several functions below and then re-define them as methods as a way of demonstrating Python's support for object-oriented programming.
Python Functions
Functions perform an operation on one or more values and return a result.
Python comes with many pre-defined functions, as well as the ability to define
your own functions. Let's look at a couple of the built-in functions: len()
returns the number of items in a sequence; dir() returns a list of strings
representing the attributes of an object; list() returns a new list initialized
from some other sequence.
>>> dna = 'CTGACCACTTTACGAGGTTAGC'
>>> bases = ['A', 'C', 'G', 'T']
>>> len(dna)
22
>>> len(bases)
4
>>> dir(dna)
['__add__', '__class__', '__contains__', '__delattr__',
'__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getslice__', '__gt__', '__hash__', '__init__', '__le__',
'__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__',
'__repr__', '__rmul__', '__setattr__', '__str__', 'capitalize',
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs',
'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'replace', 'rfind', 'rindex', 'rjust', 'rstrip', 'split',
'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper']
>>> dir(bases)
['__add__', '__class__', '__contains__', '__delattr__',
'__delitem__', '__delslice__', '__doc__', '__eq__', '__ge__',
'__getattribute__', '__getitem__', '__getslice__', '__gt__',
'__hash__', '__iadd__', '__imul__', '__init__', '__le__', '__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__repr__',
'__rmul__', '__setattr__', '__setitem__', '__setslice__', '__str__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort']
>>> list(dna)
['C', 'T', 'G', 'A', 'C', 'C', 'A', 'C', 'T', 'T', 'T',
'A', 'C', 'G', 'A', 'G', 'G', 'T', 'T', 'A', 'G', 'C']
Next, we're going to define some functions of our own that will perform useful operations on biological sequence data.
User-defined Functions
Here is the process for creating your own function in Python. The first line
begins with the keyword def, is followed by the name of the function and any
arguments (expected input values) surrounded by parentheses, and ends with a
colon. Subsequent lines make up the body of the function and must be indented.
If a string comment appears in the first line of the body, it becomes part of
the documentation for the function. The last line of a function returns a
result.
Let's define some functions in the PyCrust shell. Then we can try each function with some sample data and see the result returned by the function.
>>> def transcribe(dna):
... """Return dna string as rna string."""
... return dna.replace('T', 'U')
...
>>> transcribe('CCGGAAGAGCTTACTTAG')
'CCGGAAGAGCUUACUUAG'
In this example we created a function, called transcribe that expects a
string representing a DNA sequence. Strings have a replace() method that will
return a copy of the original string with each occurence of one character
replaced by another. In three lines of code we've given ourselves a consistent
way to transcribe a string of DNA into RNA. Let's create another function. How
about reverse?
>>> def reverse(s):
... """Return the sequence string in reverse order."""
... letters = list(s)
... letters.reverse()
... return ''.join(letters)
...
>>> reverse('CCGGAAGAGCTTACTTAG')
'GATTCATTCGAGAAGGCC'
There are a few new things in this function that need explanation. First,
we've used an argument name of "s" instead of "dna". You can name your
arguments whatever you like in Python. It is something of a convention to use
short names based on their expected value or meaning. So "s" for string is
fairly common in Python code. The other reason to use "s" instead of "dna" in
this example is that this function works correctly on any string, not just
strings representing dna sequences. So "s" is a better reflection of the
generic utility of this function than "dna".
You can see that the reverse function takes in a string, creates a list
based on the string, and reverses the order of the list. Now we need to put the
list back together as a string so we can return a string. Python string objects
have a join() method that joins together a list into a string, separating each
list element by a string value. Since we do not want any character as a
separator, we use the join() method on an empty string, represented by two
quotes ('' or "").
In order to calculate the complement of a DNA sequence, we need a way to map each of the four bases to its complement. For that, we'll use another Python sequence structure called a dictionary.