Python DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Profiling and Optimizing Python
Pages: 1, 2, 3

Reprofiling

The results were surprising. I was hoping for a 50 to 100 percent performance improvement. Here is a time of running the parser with no profiler:



jmjones@qiwi  9:25PM parser % time python aggregate_parser.py \
    ~/svn/ediplex/example_data/big_edi.txt
python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt
    0.47s user 0.01s system 99% cpu 0.481 total

Here is the parser run with the hotshot profiler:

jmjones@qiwi  5:43PM parser % time python aggregate_parser.py \
    ~/svn/ediplex/example_data/big_edi.txt
python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt
    0.42s user 0.49s system 87% cpu 1.042 total

Without the profiler, this is a performance boost of about 7X. Removing the Python interpreter startup time, library imports, and such from both sides of the equation, this was actually a performance boost of 10X. The timeit module shows a 2.3 second best time for parsing a file before optimization and .22 seconds after optimizations. Here is the statistical data from running the parser:

In [1]: from hotshot import stats

In [2]: s = stats.load("hotshot_edi_stats")

In [3]: s.sort_stats("time").print_stats()
         11005 function calls in 0.436 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.338    0.000    0.374    0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:32(is_valid_header)
     7000    0.036    0.000    0.036    0.000 /usr/lib/python2.4/string.py:248(strip)
     1001    0.035    0.000    0.428    0.000 aggregate_parser.py:57(parse_file)
     1000    0.013    0.000    0.013    0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:15(__init__)
        1    0.008    0.008    0.436    0.436 aggregate_parser.py:128(main)
     1000    0.006    0.000    0.006    0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:64(is_valid_footer)
        1    0.000    0.000    0.000    0.000 aggregate_parser.py:122(get_canned_aggregate_parser)
        1    0.000    0.000    0.000    0.000 aggregate_parser.py:15(__init__)
        1    0.000    0.000    0.000    0.000 aggregate_parser.py:107(add_parser)
        0    0.000             0.000          profile:0(profiler)


Out[3]: <pstats.Stats instance at 0xb798f4cc>

The profiler shows far fewer methods than before optimizing things. The majority of time is now in methods that actually do things rather than no-op or state transitions. All of the features that had no-op placeholders are now gone and will occur only when a user decides he needs to call them.

The profiler also shows far fewer method calls. Before, there were two methods, body_seg() and segment(), with calls over 130,000 times. Function calls were fairly expensive in previous versions of Python--and I believe they still are, but to a lesser degree. This explains why 130,000 function calls to even a no-op created a performance bottle neck.

I heavily optimized the method that searched for the end of an EDI interchange (now called is_valid_footer()) but hardly touched the one that identified the beginning of an interchange (now called is_valid_header()). is_valid_header() is now the biggest bottleneck, taking 10 times more time than the next method in the list, which happens to be the Python standard string.strip() method--and about 77 percent of the total processing time. The only section of is_valid_header() that does anything is a for loop that iterates over range(106) and checks the header string for correctness. If I really wanted to optimize is_valid_header(), I would remove the for loop and either do the string check with a (shudder) regex or write a function in C and swig it. I may implement one of these optimizations just because of that nagging bottleneck (and for fun!)--but the 2,080 2K interchanges per second, which includes Python interpreter startup and library imports, or 4,500 2K interchanges per second, excluding Python startup and imports, is faster than I had hoped to take this parser.

Just for the sake of comparison, here is the timeit run of the optimized code:

jmjones@qiwi  4:33PM parser % python aggregate_parser.py
    ~/svn/ediplex/example_data/edi_1000_interchanges.txt
[0.4770960807800293, 0.47198605537414551, 0.43927288055419922,
0.29295921325683594, 0.21883893013000488, 0.21800088882446289,
0.2185051441192627, 0.21764492988586426, 0.23419904708862305,
0.21747803688049316] :: 0.21747803688

Conclusion

I have attempted to live by the "premature optimization" motto with the activity surrounding this article. I spent very little time up front trying to predict where my code would run slow, and made very few concessions to preemptively make it run faster. I simply used the hotshot profiler, a tool that is part of the Python standard library, found my bottlenecks, rearchitected my parser (which is probably a bit more extreme than what most folks will have to do), made some code changes, and saw a significant performance improvement. For any performance analysis of Python source code, the profiler is an indispensable tool.

Jeremy Jones is a software engineer who works for Predictix. His weapon of choice is Python.


Return to the Python DevCenter.



Sponsored by: