The results were surprising. I was hoping for a 50 to 100 percent performance improvement. Here is a
time of running the parser with no profiler:
jmjones@qiwi 9:25PM parser % time python aggregate_parser.py \ ~/svn/ediplex/example_data/big_edi.txt python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt 0.47s user 0.01s system 99% cpu 0.481 total
Here is the parser run with the
jmjones@qiwi 5:43PM parser % time python aggregate_parser.py \ ~/svn/ediplex/example_data/big_edi.txt python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt 0.42s user 0.49s system 87% cpu 1.042 total
Without the profiler, this is a performance boost of about 7X. Removing the Python interpreter startup time, library imports, and such from both sides of the equation, this was actually a performance boost of 10X. The
timeit module shows a 2.3 second best time for parsing a file before optimization and .22 seconds after optimizations. Here is the statistical data from running the parser:
In : from hotshot import stats In : s = stats.load("hotshot_edi_stats") In : s.sort_stats("time").print_stats() 11005 function calls in 0.436 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1000 0.338 0.000 0.374 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:32(is_valid_header) 7000 0.036 0.000 0.036 0.000 /usr/lib/python2.4/string.py:248(strip) 1001 0.035 0.000 0.428 0.000 aggregate_parser.py:57(parse_file) 1000 0.013 0.000 0.013 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:15(__init__) 1 0.008 0.008 0.436 0.436 aggregate_parser.py:128(main) 1000 0.006 0.000 0.006 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:64(is_valid_footer) 1 0.000 0.000 0.000 0.000 aggregate_parser.py:122(get_canned_aggregate_parser) 1 0.000 0.000 0.000 0.000 aggregate_parser.py:15(__init__) 1 0.000 0.000 0.000 0.000 aggregate_parser.py:107(add_parser) 0 0.000 0.000 profile:0(profiler) Out: <pstats.Stats instance at 0xb798f4cc>
The profiler shows far fewer methods than before optimizing things. The majority of time is now in methods that actually do things rather than no-op or state transitions. All of the features that had no-op placeholders are now gone and will occur only when a user decides he needs to call them.
The profiler also shows far fewer method calls. Before, there were two
segment(), with calls over
130,000 times. Function calls were fairly expensive in previous versions of
Python--and I believe they still are, but to a lesser degree. This explains
why 130,000 function calls to even a no-op created a performance bottle
I heavily optimized the method that searched for the end of an EDI interchange (now called
is_valid_footer()) but hardly touched the one that identified the beginning of an interchange (now called
is_valid_header() is now the biggest bottleneck, taking 10 times more time than the next method in the list, which happens to be the Python standard
string.strip() method--and about 77 percent of the total processing time. The only section of
is_valid_header() that does anything is a
for loop that iterates over
range(106) and checks the header string for correctness. If I really wanted to optimize
is_valid_header(), I would remove the
for loop and either do the string check with a (shudder) regex or write a function in C and swig it. I may implement one of these optimizations just because of that nagging bottleneck (and for fun!)--but the 2,080 2K interchanges per second, which includes Python interpreter startup and library imports, or 4,500 2K interchanges per second, excluding Python startup and imports, is faster than I had hoped to take this parser.
Just for the sake of comparison, here is the
timeit run of the optimized code:
jmjones@qiwi 4:33PM parser % python aggregate_parser.py ~/svn/ediplex/example_data/edi_1000_interchanges.txt [0.4770960807800293, 0.47198605537414551, 0.43927288055419922, 0.29295921325683594, 0.21883893013000488, 0.21800088882446289, 0.2185051441192627, 0.21764492988586426, 0.23419904708862305, 0.21747803688049316] :: 0.21747803688
I have attempted to live by the "premature optimization" motto with the activity surrounding this article. I spent very little time up front trying to predict where my code would run slow, and made very few concessions to preemptively make it run faster. I simply used the
hotshot profiler, a tool that is part of the Python standard library, found my bottlenecks, rearchitected my parser (which is probably a bit more extreme than what most folks will have to do), made some code changes, and saw a significant performance improvement. For any performance analysis of Python source code, the profiler is an indispensable tool.
Jeremy Jones is a software engineer who works for Predictix. His weapon of choice is Python.
Return to the Python DevCenter.