PHP Search Engine Showdown
Pages: 1, 2
If you're going to install a local search engine and are using PHP, you have several great PHP engines to consider. We took the leaders in the field, summarized their features (Table 1), tested them all, and found:
iSearch has an excellent range of options for the needs of nearly any site, yet the core functions are encrypted and highly unchangeable. Also, in testing, the spider would trap itself in a loop or unreachable page every 20 minutes or so, making a cron-based update most unreliable.
MnogoSearch is quite powerful and versatile, but unlike most of its PHP-minded competitors, it must be compiled before usage and has the most substantial learning curve. It is immediately compatible with every major database, including SQLite, and comes with front ends for PHP, C, and Perl. There is a command-line interface to perform all maintenance and indexing; once you have configured it correctly, it is also useful for automation. It has a wide variety of features, including searches of your site, FTP archive searches, news article and newspaper searches, and more.
PHPDig uses a MySQL database, building a glossary with words from the pages you index. The search result displays the pages ranked by keyword density. Though PHPDig's fame and clean code would suggest otherwise, this search engine is far from being one of the best available. The indexing speed is quite slow, especially in comparison with MnogoSearch or RiSearch. It's overflowing with features and plugins for any format of data and has built-in index scheduling routines.
RiSearch is powerful and has a very fast search script, designed to work with hundreds of megabytes of text data. It does not use libraries or databases but is Perl code with PHP front ends. RiSearch is surprisingly fast to search for a file-based storage back end. However, this affects the search result relevancy, which is poorer than other options. It is therefore better for finding unique phrases, like names of species, than for searching concepts.
Sphider is PHP code that uses MySQL for indexing pages. It works for sites up to 20,000 pages. It also works great as a tool for site analysis, such as finding broken links and gathering statistics about the site. It has an efficient back end and search algorithm, but its crawling methods function poorly.
Sphinx is a fast and capable full text search engine, particularly suited for database content. It runs its own daemon (which you compile) and does not have any web crawlers bundled. Features include high performance, good scalability and search quality, advanced sorting, filtering, and grouping.
TSEP causes a long delay when executing the crawler if the data to index is extensive. This was a problem on one server with time-out/keep-alive of 8/15, though adding
ignore_user_abort() to the top of indexer.php bypasses it.
Table 1. Summary of leading PHP engines
|Database||MySQL, SQLite||Several||MySQL, SQLite||MySQL||MySQL||Flat files (text)||MySQL, PostgreSQL, Flat files|
|Support||Medium (forum)||Very good (discussion list, forum, and paid email support)||Medium (forum)||Poor (forum)||Medium (FAQ and forum)||Medium (forum)||Good|
|PHP 5 compatible||Yes||Yes (for the interface)||Yes||Yes||Yes||Yes
(requires PHP 5)
|Install package download||44K||2MB||1.5MB||273K||150K||128K||~300K|
|Access needed to install||Root||Root (need to compile)||Root||Root||Root||FTP||Shell (non root)|
|Recommended file limit||High||Very high||High||High||High||Very high||Very high|
|Index speed||Very slow||~500 in 10 seconds||~500 in 14 seconds||Slow||Medium||~500 in 18 seconds||4-10 MB/sec|
Overall ranking represents the author's overall ranking of the engine, based on ease of use, power, spidering speed, and ranking relevancy.
Database lists the kind of database used for creating and storing the index.
Support refers to the customer support available for each engine and how you can ask questions to clarify any problems you might have on the installation or usage of the tool.
Access needed to install indicates the access you need to have on the server in order to fully install your application and index your site.
Recommended file limit identifies the number of documents that the search engine can support in order to run at its full capacity.
Other PHP search engines, not included in the table but listed below, are available. We do not recommend these engines as highly.
SiteSearch is a PHP engine that uses a text file database to index the information on the site. It includes several useful features, such as indexing by meta tags and multiple word search. It has several add-ons, including multilanguage support and text database support.
Simple Web Search is a script that searches a SWISH-E index. It requires SWISH-E 1.x or 2.x and PHP 3.0.8 or newer on the system, and a web server supporting PHP 3.
IndexServer is a useful plugin package that lets you perform a variety of tasks. Indexing web sites allows you to further query the final index.
Xapian is only an indexing tool, but the company also offers a web site search engine package that includes its Omega solution, which looks promising and has several interesting features. Xapian uses SWIG for PHP, so the indexer is not PHP5 compatible. This is where BeebleX comes in. BeebleX is a search engine that uses a PHP 5-compatible Xapian extension. For more information, visit Marco Tabini's thoughts on BeebleX.
There is no ideal PHP search engine, but our overall impression was that Sphider and MnogoSearch are the best contenders. In general, Sphider returns more accurate hits, and MnogoSearch is easier to set up.
Sphinx is a relatively new contender, and shows good promise. Although Sphinx is little known and has few real-world installations so far, it is worth checking in on in the future, particularly if you don't need a web crawler. Xapian is a strong engine, with support for many programming languages, and an active community, but we found it difficult to set up in PHP.
If you want to know more about search engines, the following sites have plenty of descriptions, reviews, news, guides, how-tos, and technologies:
Michael Douma is an expert in user interface design and web-based interactive education.
Return to the PHP DevCenter.
Search Engines - Web 3.0 ??
2009-12-17 10:13:52 timventura [View]
Need some information related to search engines
2007-05-23 06:32:19 SEO_Hawk [View]
Another search engine
2007-05-16 08:46:41 GreyWyvern [View]
2007-04-27 03:14:46 jerj [View]
you're mistaken to dismiss Xapian
2007-03-30 03:41:05 heathd [View]
2007-03-26 20:48:50 helphand [View]
what about the swish-e pecl module?
2007-03-23 15:50:04 gggeek [View]