PHP DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


PHP Search Engine Showdown
Pages: 1, 2

PHP Engines

If you're going to install a local search engine and are using PHP, you have several great PHP engines to consider. We took the leaders in the field, summarized their features (Table 1), tested them all, and found:



iSearch has an excellent range of options for the needs of nearly any site, yet the core functions are encrypted and highly unchangeable. Also, in testing, the spider would trap itself in a loop or unreachable page every 20 minutes or so, making a cron-based update most unreliable.

MnogoSearch is quite powerful and versatile, but unlike most of its PHP-minded competitors, it must be compiled before usage and has the most substantial learning curve. It is immediately compatible with every major database, including SQLite, and comes with front ends for PHP, C, and Perl. There is a command-line interface to perform all maintenance and indexing; once you have configured it correctly, it is also useful for automation. It has a wide variety of features, including searches of your site, FTP archive searches, news article and newspaper searches, and more.

PHPDig uses a MySQL database, building a glossary with words from the pages you index. The search result displays the pages ranked by keyword density. Though PHPDig's fame and clean code would suggest otherwise, this search engine is far from being one of the best available. The indexing speed is quite slow, especially in comparison with MnogoSearch or RiSearch. It's overflowing with features and plugins for any format of data and has built-in index scheduling routines.

RiSearch is powerful and has a very fast search script, designed to work with hundreds of megabytes of text data. It does not use libraries or databases but is Perl code with PHP front ends. RiSearch is surprisingly fast to search for a file-based storage back end. However, this affects the search result relevancy, which is poorer than other options. It is therefore better for finding unique phrases, like names of species, than for searching concepts.

Sphider is PHP code that uses MySQL for indexing pages. It works for sites up to 20,000 pages. It also works great as a tool for site analysis, such as finding broken links and gathering statistics about the site. It has an efficient back end and search algorithm, but its crawling methods function poorly.

Sphinx is a fast and capable full text search engine, particularly suited for database content. It runs its own daemon (which you compile) and does not have any web crawlers bundled. Features include high performance, good scalability and search quality, advanced sorting, filtering, and grouping.

TSEP causes a long delay when executing the crawler if the data to index is extensive. This was a problem on one server with time-out/keep-alive of 8/15, though adding ignore_user_abort() to the top of indexer.php bypasses it.

Table 1. Summary of leading PHP engines

  Sphider MnogoSearch TSEP PHPDig iSearch RiSearch Sphinx
Overall ranking *** *** ** ** * * **
Database MySQL, SQLite Several MySQL, SQLite MySQL MySQL Flat files (text) MySQL, PostgreSQL, Flat files
Multilanguage support No Yes Yes No Yes Yes Yes
Support Medium (forum) Very good (discussion list, forum, and paid email support) Medium (forum) Poor (forum) Medium (FAQ and forum) Medium (forum) Good
User interface Easy Easy Easy Medium Difficult Easy Easy
Customizability High High High High Medium Medium High
PHP 5 compatible Yes Yes (for the interface) Yes Yes Yes Yes
(requires PHP 5)
Yes
SQLite compatible No Yes Yes Yes No No No
URL-free crawling Yes Yes Yes No No Yes Yes
Install package download 44K 2MB 1.5MB 273K 150K 128K ~300K
Installation Medium Very easy Easy Easy-Medium Easy-Medium Easy-Medium Easy
Access needed to install Root Root (need to compile) Root Root Root FTP Shell (non root)
Recommended file limit High Very high High High High Very high Very high
Index speed Very slow ~500 in 10 seconds ~500 in 14 seconds Slow Medium ~500 in 18 seconds 4-10 MB/sec

Overall ranking represents the author's overall ranking of the engine, based on ease of use, power, spidering speed, and ranking relevancy.

Database lists the kind of database used for creating and storing the index.

Support refers to the customer support available for each engine and how you can ask questions to clarify any problems you might have on the installation or usage of the tool.

Access needed to install indicates the access you need to have on the server in order to fully install your application and index your site.

Recommended file limit identifies the number of documents that the search engine can support in order to run at its full capacity.

Other PHP search engines, not included in the table but listed below, are available. We do not recommend these engines as highly.

SiteSearch is a PHP engine that uses a text file database to index the information on the site. It includes several useful features, such as indexing by meta tags and multiple word search. It has several add-ons, including multilanguage support and text database support.

Simple Web Search is a script that searches a SWISH-E index. It requires SWISH-E 1.x or 2.x and PHP 3.0.8 or newer on the system, and a web server supporting PHP 3.

IndexServer is a useful plugin package that lets you perform a variety of tasks. Indexing web sites allows you to further query the final index.

Xapian is only an indexing tool, but the company also offers a web site search engine package that includes its Omega solution, which looks promising and has several interesting features. Xapian uses SWIG for PHP, so the indexer is not PHP5 compatible. This is where BeebleX comes in. BeebleX is a search engine that uses a PHP 5-compatible Xapian extension. For more information, visit Marco Tabini's thoughts on BeebleX.

Recommendations

There is no ideal PHP search engine, but our overall impression was that Sphider and MnogoSearch are the best contenders. In general, Sphider returns more accurate hits, and MnogoSearch is easier to set up.

Sphinx is a relatively new contender, and shows good promise. Although Sphinx is little known and has few real-world installations so far, it is worth checking in on in the future, particularly if you don't need a web crawler. Xapian is a strong engine, with support for many programming languages, and an active community, but we found it difficult to set up in PHP.

Conclusion

If you want to know more about search engines, the following sites have plenty of descriptions, reviews, news, guides, how-tos, and technologies:

Michael Douma is an expert in user interface design and web-based interactive education.


Return to the PHP DevCenter.


Valuable Online Certification Training

Online Certification for Your Career
Earn a Certificate for Professional Development from the University of Illinois Office of Continuing Education upon completion of each online certificate program.

PHP/SQL Programming Certificate — The PHP/SQL Programming Certificate series is comprised of four courses covering beginning to advanced PHP programming, beginning to advanced database programming using the SQL language, database theory, and integrated Web 2.0 programming using PHP and SQL on the Unix/Linux mySQL platform.

Enroll today!


Sponsored by: