PHP DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Building a Simple Search Engine with PHP

by Daniel Solin
10/24/2002

A little while ago, I was working on an intranet site for a mid-sized company. As the site grew in both size and popularity, the assigner requested me to extend the site with a search feature. Since one of the rules of the intranet was that all logic code should be written in-house, using an existing open source engine was not an option.

Within a day, the engine was quite complete, and the result actually turned out better than expected. With PHP, MySQL, and a few techniques, these small projects are very easy. This article presents a cut-down version of the search engine. I hope this will encourage you to develop an engine that suits your particular needs, with the exact features you desire.

Database Design and Logic

We'll use MySQL as a database backend to store our search data. It's possible to shell out to Unix commands such as grep and find, but that would mean running the search engine on the machine hosting the files. As well, it would be more difficult to index pages served from a database. We'll tackle the database first.

The database for the search engine consists of three tables: page, word, and occurrence. page holds all indexed web pages, and word holds all of the words found on the indexed pages. The rows in occurrence correlate words to their containing pages. Each row represents one occurrence of one particular word on one particular page. The SQL for creating these tables are shown below.

CREATE TABLE page (
   page_id int(10) unsigned NOT NULL auto_increment,
   page_url varchar(200) NOT NULL default '',
   PRIMARY KEY (page_id)
) TYPE=MyISAM;

CREATE TABLE word (
   word_id int(10) unsigned NOT NULL auto_increment,
   word_word varchar(50) NOT NULL default '',
   PRIMARY KEY (word_id)
) TYPE=MyISAM;

CREATE TABLE occurrence (
   occurrence_id int(10) unsigned NOT NULL auto_increment,
   word_id int(10) unsigned NOT NULL default '0',
   page_id int(10) unsigned NOT NULL default '0',
   PRIMARY KEY (occurrence_id)
) TYPE=MyISAM;

While page and word hold actual data, occurrence acts only as a reference table. By joining occurrence with page and word, we can determine which pages contain a word, as well as how many times the word occurs. Before that, though, we need some data.

Pages: 1, 2, 3

Next Pagearrow




Valuable Online Certification Training

Online Certification for Your Career
Earn a Certificate for Professional Development from the University of Illinois Office of Continuing Education upon completion of each online certificate program.

PHP/SQL Programming Certificate — The PHP/SQL Programming Certificate series is comprised of four courses covering beginning to advanced PHP programming, beginning to advanced database programming using the SQL language, database theory, and integrated Web 2.0 programming using PHP and SQL on the Unix/Linux mySQL platform.

Enroll today!


Sponsored by: