Building a Simple Search Engine with PHP
Pages: 1, 2, 3
The Search Interface
Of course, users will not be able to work with the MySQL database directly.
Therefore, we'll create another PHP script that provides an HTML form to query
the database. This works just like any other search engine. The user enters a
word in a textbox, hits Enter, and receives a page of results linked to the
appropriate pages. The result order depends on the number of times a keyword
appears in each document. The search.php script is listed
below.
<?
/*
* search.php
*
* Script for searching a database populated with keywords by the
* populate.php-script.
*/
print "<html><head><title>My Search Engine</title></head><body>\n";
if( $_POST['keyword'] )
{
/* Connect to the database: */
mysql_pconnect("localhost","root","secret")
or die("ERROR: Could not connect to database!");
mysql_select_db("test");
/* Get timestamp before executing the query: */
$start_time = getmicrotime();
/* Set $keyword and $results, and use addslashes() to
* minimize the risk of executing unwanted SQL commands: */
$keyword = addslashes( $_POST['keyword'] );
$results = addslashes( $_POST['results'] );
/* Execute the query that performs the actual search in the DB: */
$result = mysql_query(" SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
w.word_word = \"$keyword\"
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results" );
/* Get timestamp when the query is finished: */
$end_time = getmicrotime();
/* Present the search-results: */
print "<h2>Search results for '".$_POST['keyword']."':</h2>\n";
for( $i = 1; $row = mysql_fetch_array($result); $i++ )
{
print "$i. <a href='".$row['url']."'>".$row['url']."</a>\n";
print "(occurrences: ".$row['occurrences'].")<br><br>\n";
}
/* Present how long it took the execute the query: */
print "query executed in ".(substr($end_time-$start_time,0,5))." seconds.";
}
else
{
/* If no keyword is defined, present the search page instead: */
print "<form method='post'> Keyword:
<input type='text' size='20' name='keyword'>\n";
print "Results: <select name='results'><option value='5'>5</option>\n";
print "<option value='10'>10</option><option value='15'>15</option>\n";
print "<option value='20'>20</option></select>\n";
print "<input type='submit' value='Search'></form>\n";
}
print "</body></html>\n";
/* Simple function for retrieving the current timestamp in microseconds: */
function getmicrotime()
{
list($usec, $sec) = explode(" ",microtime());
return ((float)$usec + (float)$sec);
}
?>
The script may be called with or without the keyword argument.
If it's defined, the script searches for that word in the database. It will
also show the length of time it took to process the query. Otherwise, the
script presents the search page instead. That page will resemble Figure 1.

Figure 1 - our simple search page
Let's search on the keyword linux. Our dataset produces results similar to Figure 2.

Figure 2 - the search results page
As expected, onlamp.com appears first on the result page because the keyword linux appears more frequently on this site than on the others. A search for java would probably get onjava.com on the top, and 'xml' would most likely generate the most hits for xml.com. Also note that we've limited the results to the five most interesting pages.
Speeding Up the Database
As the bottom of the results page shows, the query took 0.393 seconds to execute. While this may not seem like an incredibly long time, it does represent quite a hit as the database grows. Fortunately, since we're using a database, there's a very simple solution.
CREATE INDEX word_word_ix ON word (word_word);
This will create an index in the word table on the
word_word column. Since all of our searches start with this
column, the database will find the appropriate pages much more quickly. To
prove this point, we will search for the keyword linux again, to see if we
gained any performance. See Figure 3.

Figure 3 - searching with an index
Nice. It took 0.028 seconds, a speed increase of 0.365 seconds, or 1,400 percent. If this engine handled an average of 1,000 queries per hour, this would mean a savings of about 144 minutes per day.
Summary
As shown in this article, useful search engines can be built pretty simply. Without much hassle, you could develop this concept further to handle multiple keywords, boolean operators, stop words, and other features you find in many commercial search facilities. It would also be interesting to populate the database further with a few hundred megs of data. Would the speed still be reasonable? Probably. One thing we could be absolutely sure of, however, is that for an intranet of a mid-sized company with just a few dozen searches per hour, this solution can offer stunning performance with minimal setup.
Whether you're planning to develop a big-scale commercial search engine, or
are just playing around, http://www.robotstxt.org/wc/robots.html
offers lots of helpful and interesting reading on this topic. For example, it
describes the use of the standardized robots.txt file, which every
Internet spider should use to determine what it can and can't do on a specific
site. Please read and follow the rules if you don't control the sites you
want to search.
I wish you good luck and look forward to getting a visit from your spider soon. :)
Daniel Solin is a freelance writer and Linux consultant whose specialty is GUI programming. His first book, SAMS Teach Yourself Qt Programming in 24 hours, was published in May, 2000.
Return to the PHP DevCenter.
-
Help
2010-04-29 14:24:22 joe--- [View]
-
Bugfixes, Better Error Checking and Full Project Download
2009-01-10 06:56:33 ZIMSICAL.com [View]
-
auto index lower level pages
2008-05-02 15:39:37 lektrikpuke [View]
-
DEFINE A URL
2007-12-30 11:44:57 Arsench [View]
-
IIS problems
2007-10-16 06:49:35 snoski3 [View]
-
Help
2007-09-14 10:46:30 Chuddy [View]
-
Am i doing something wrong
2007-09-02 10:42:05 louiscbrooks [View]
-
MULTIPLE KEYWORD SEARCH
2007-07-13 09:46:17 -MJD- [View]
-
Help - please!
2007-06-02 12:20:19 88guy [View]
-
Natural Language search engine
2007-06-01 12:50:42 tennis_dunlop [View]
-
populate.php
2007-03-06 14:45:44 adevesa [View]
-
modify
2007-02-28 15:00:04 katie_P [View]
-
Simplifying the query
2007-01-25 15:45:35 pjdevitt [View]
-
Giving Back
2007-01-23 16:52:24 PHPchick [View]
-
TSEP - ready, well featured PHP search engine
2007-01-20 23:48:53 ONG [View]
-
Fantastic Article
2006-08-25 06:44:05 JAS168 [View]
-
Passing session id through fopen
2005-03-20 22:42:59 TennisOne [View]
-
another success
2005-02-15 14:51:33 Lykerus [View]
-
another success
2005-02-15 14:50:47 Lykerus [View]
-
success
2005-01-26 17:18:07 peetycox [View]
-
mysql_fetch_array() error
2005-01-21 14:43:31 Lykerus [View]
-
help me for populate.php
2004-12-03 00:36:42 setiawan77th [View]
-
help about the search engine
2004-03-03 02:00:56 dinesh2037 [View]
-
Search 0-40 words
2004-01-01 18:39:58 anonymous2 [View]
-
Add on for foreign language
2003-12-31 00:35:06 anonymous2 [View]
-
A more detailed query...
2003-12-23 06:52:17 anonymous2 [View]
-
multiple keywords
2003-12-22 01:39:29 anonymous2 [View]
-
A more difficult search engine
2003-12-18 08:16:26 anonymous2 [View]
-
Search in dynamic page
2003-10-20 03:29:22 anonymous2 [View]
-
it doesn't work?
2003-09-20 16:31:11 anonymous2 [View]
-
Nice
2003-09-04 12:12:14 anonymous2 [View]
-
Index through the file system
2002-11-04 03:48:05 anonymous2 [View]
-
Use Exclusion Tags
2002-11-04 03:27:27 anonymous2 [View]
-
htDig
2002-10-31 13:21:44 anonymous2 [View]
-
Optimizing multiple occurences of same word on same page
2002-10-30 08:07:18 anonymous2 [View]
-
simple search
2002-10-28 23:33:02 anonymous2 [View]
-
MySQL Fulltext?
2002-10-28 00:01:55 anonymous2 [View]