Python DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Beginning Python for Bioinformatics
Pages: 1, 2, 3, 4, 5

Python Classes

To create your own custom objects, you must define a sort of template, or cookie cutter, called a class. You do so in Python using the class statement, followed by the name of the class and a colon. Following this, the body of the class definition contains the properties and methods that will be available for all object instances that are based on this class.

Let's take all the functions that we've created so far and recast them as methods of a DNA class. Then we'll see how to create DNA objects based on our DNA class. While we could do all this from the Python shell, instead we will place this code into a bio.py file and show how we can use this file from the Python shell. The contents of our bio.py file, which Python calls a module, look like this.

class DNA: 
    """Class representing DNA as a string sequence.""" 
 
    basecomplement = {'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'} 
 
    def __init__(self, s): 
        """Create DNA instance initialized to string s.""" 
        self.seq = s 
     
    def transcribe(self): 
        """Return as rna string.""" 
        return self.seq.replace('T', 'U') 
     
    def reverse(self): 
        """Return dna string in reverse order.""" 
        letters = list(self.seq) 
        letters.reverse() 
        return ''.join(letters) 
     
    def complement(self): 
        """Return the complementary dna string.""" 
        letters = list(self.seq) 
        letters = [self.basecomplement[base] for base in letters] 
        return ''.join(letters) 
     
    def reversecomplement(self): 
        """Return the reverse complement of the dna string.""" 
        letters = list(self.seq) 
        letters.reverse() 
        letters = [self.basecomplement[base] for base in letters] 
        return ''.join(letters) 
     
    def gc(self): 
        """Return the percentage of dna composed of G+C.""" 
        s = self.seq 
        gc = s.count('G') + s.count('C') 
        return gc * 100.0 / len(s) 
 
    def codons(self): 
        """Return list of codons for the dna string.""" 
        s = self.seq 
        end = len(s) - (len(s) % 3) - 1 
        codons = [s[i:i+3] for i in range(0, end, 3)] 
        return codons 

Much of this should look familiar based on our existing functions. Class definitions do add a few new elements that we need to cover. Let's look at how to use this new class before exploring the extra details.

We create object instances by calling the class, much like we would call a function. The first thing we need to do is make the Python shell aware of this class definition. We do that by importing the DNA class definition from our bio.py module. Then we create an instance of the DNA class, passing in the initial string value. From that point on the object keeps track of its own sequence value, and we simply call the methods that are defined for that object.

>>> from bio import DNA 
>>> dna1 = DNA('CGACAAGGATTAGTAGTTTAC') 
>>> dna1.transcribe() 
'CGACAAGGAUUAGUAGUUUAC' 
>>> dna1.reverse() 
'CATTTGATGATTAGGAACAGC' 
>>> dna1.complement() 
'GCTGTTCCTAATCATCAAATG' 
>>> dna1.reversecomplement() 
'GTAAACTACTAATCCTTGTCG' 
>>> dna1.gc() 
38.095238095238095 
>>> dna1.codons() 
['CGA', 'CAA', 'GGA', 'TTA', 'GTA', 'GTT', 'TAC'] 

Since a class acts as a kind of template that's used to create multiple object instances, we need the ability, inside a class method, to refer to the specific object instance on which the method is called. To accommodate this need, Python automatically passes the object instance as the first argument to each method. The convention in the Python community is to name that first argument "self." That's why you see "self" as the first argument in all the method definitions of our DNA class.

The other thing to note is that the __init__() method. Python calls this specially named method when creating instances of the class. In our example, DNA.__init__ expects to receive a string argument, which we then store as a property of the object instance, self.seq.

We made one other change when we moved our functions into class methods. We moved the basecomplement dictionary definition out of the complement() method and into the class definition. As part of the class definition, the dictionary is only created once, rather than each time the method is called. The dictionary is shared by all instances of the class, and it can be used by more than one method. This is in contrast to the seq property, for which each object instance will have its own unique value.

As you can see, classes provide a effective way to group related data and functionality. Let's finish our shell session by creating a few more DNA instances.

>>> dna2 = DNA('ACGGGAGGACGGGAAAATTACTAGCACCCGCATAGACTT') 
>>> dna2.codons() 
['ACG', 'GGA', 'GGA', 'CGG', 'GAA', 'AAT', 'TAC', 'TAG',  
'CAC', 'CCG', 'CAT', 'AGA', 'CTT'] 
>>> dna3 = DNA(dna1.seq + dna2.seq) 
>>> dna3.reversecomplement() 
'AAGTCTATGCGGGTGCTAGTAATTTTCCCGTCCTCCCGTGTAAACTACTAATCCTTGTCG' 
>>> dna4 = DNA(dna3.reversecomplement()) 
>>> dna4.codons() 
['AAG', 'TCT', 'ATG', 'CGG', 'GTG', 'CTA', 'GTA', 'ATT',  
'TTC', 'CCG', 'TCC', 'TCC', 'CGT', 'GTA', 'AAC', 'TAC',  
'TAA', 'TCC', 'TTG', 'TCG'] 

Even with this rudimentary class definition, manipulated from the Python shell, we can start to see Python's potential for analyzing biological data in a clear, coherent fashion, with a minimum of syntactic overhead.

Conclusion

Python is a popular, open source programming language with much to offer the bioinformatics community. At the same time, Python came late to the bioinformatics party and may never rise to level of popularity of Perl. Choice is always a good thing, though, and Python offers a viable, reliable option for biologists and professional programmers alike. We hope this article gives you a reason to take a closer look at Python.

Additional Resources

If you like what you've seen of Python, here are some additional resources to explore.

Patrick O'Brien is an independent software developer and trainer, specializing in the Python programming language. He is the creator of PyCrust, a developer on the PythonCard project, and leader of the PyPerSyst project. He may be reached at pobrien@orbtech.com.


Return to the Python DevCenter.



Sponsored by: