一。Mnogosearch是php的搜索引擎 同dateparksearch一样,并且是由dpsearch改良而来的,与php整合比较好用。
下面是mnogosearch叙述的mnogosearch安装步骤,已经很完整了,如果还缺少某些应用就是用 apt-get install XXX 命令安装就可以
原文地址 http://www.mnogosearch.org/board/message.php?id=19573
Posted by
Kostas Paganelis
2007-09-07 13:37:17
install mnogosearch-3.3.4 with php extension module
if they arent already installed sudo apt-get install apache2-mpm-prefork and sudo apt-get install mysql-server
1. sudo apt-get install zlib1g-dev 2. sudo apt-get install libmysqlclient15-dev inside mnogosearch-3.x.x directory 3. ./install.pl (build shared libraries - default settings)
or
./configure --prefix=/usr/local/mnogosearch --bindir=/usr/local/mnogosearch/bin --sbindir=/usr/local/mnogosearch/sbin --sysconfdir=/usr/local/mnogosearch/etc --localstatedir=/usr/local/mnogosearch/var --libdir=/usr/local/mnogosearch/lib --includedir=/usr/local/mnogosearch/include --mandir=/usr/local/mnogosearch/man --enable-shared --enable-static --enable-syslog --without-docs --enable-pthreads --disable-dmalloc --enable-parser --enable-mp3 --enable-file --enable-http --enable-ftp --enable-htdb --enable-news --with-mysql
4. make 5. sudo make install
Install PHP with mnogosearch support
the packages below may be already installed.if not,install them x. sudo apt-get install build-essential flex x. sudo apt-get install libxml2-dev x. sudo apt-get install g++ x. sudo apt-get install apache2-prefork-dev
6. ./configure / --disable-debug / --disable-rpath / --enable-bcmath / --enable-calendar / --enable-maintainer-zts / --enable-embed=shared / --enable-force-cgi-redirect / --enable-ftp / --enable-inline-optimization / --enable-magic-quotes / --enable-memory-limit / --enable-pic / --enable-safe-mode / --enable-sockets / --enable-track-vars / --enable-trans-sid / --enable-wddx / --with-db / --with-regex=system / --with-pear / --with-xml / --with-xmlrpc / --with-zli / --with-mysql=/usr / --with-gd / --enable-mbstring / --with-apxs2=/usr/bin/apxs2 / --with-mnogosearch
7. make the step below is necessary just in order ta have the httpd.conf with at least a line (the httpd.conf must not be empty) 8. sudo gedit /etc/apache2/httpd.conf and write inside "LoadModule mod_xmlent /usr/lib/apache2/modules/mod_xmlent.so" after the installation you can remove the line 9. sudo make install 10. sudo cp php.ini-dist /usr/local/lib/php.ini 11. sudo gedit /etc/apache2/mods-enabled/php5.load and write inside "LoadModule php5_module modules/libphp5.so" 12. sudo gedit /etc/apache2/mods-enabled/php5.conf and write inside "<IfModule php5_module> AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps </IfModule>" 13. sudo gedit /usr/local/lib/php.ini and the parameters you want
create database and user for mnogosearch edit mnogosearch/etc/indexer.conf (define at least Server and DBAddr) copy mnogosearch/etc/stopwords.conf-dist mnogosearch/etc/stopwords.conf sudo cp mnogosearch/etc/langmap.conf-dist mnogosearch/etc/langmap.conf run mnogosearch/sbin/indexer -Ecreate in order to create the database structure run the indexer
make the PHP extension module
the step below may not be needed x. sudo apt-get install autoconf
14. run phpize in extension module directory (1.96) the step below may not be needed x. sudo apt-get install re2c 15. ./configure --with-mnogosearch 16. make
17. then you have mnogosearch.so and mnogosearch.la in the modules directory. move them to your php extension directory (look at the extension_dir value in your php.ini) sudo cp modules/mnogosearch.so /usr/local/lib/php/extensions/ 18. edit php.ini 19. add the line: extension = mnogosearch.so in the extension section - restart apache
after that you configure the file search.htm from the mnogosearch-php-3.2.11 in any way you want.
At first it gave me no results but when i commented most of the search options i had results normally (e.g categories etc - i hadn't configured the indexer to index the pages by categories or tags).
mnogosearch配置文档:
注:mnogosearch配置相对比dpsearch简单些,只需要配置一个config就可以。
一、配置DBAddr 部分 ;
二、Document sections. 部分
三、server 部分
还需要注意db和congif的 charset要一致。
这部分是配置数据库DB的 section
########################################################################### # DBAddr <URL-style database description> # Options (type,host,database name,port,user and password) # to connect to SQL database. # Should be used before any other commands. # Has global effect for whole config file. # Format: #DBAddr <DBType>:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/[?dbmode=mode] # # ODBC notes: #Use DBName to specify ODBC data source name (DSN) #DBHost does not matter,use "localhost". # # Currently supported DBType values are # mysql,pgsql,mssql,oracle,ibase,db2,mimer,sqlite. # # MySQL users can specify path to Unix socket when connecting to localhost: # mysql://foo:bar@localhost/mnogosearch/?socket=/tmp/mysql.sock # # If you are using PostgreSQL and do not specify hostname, #e.g. pgsql://user:password@/dbname/ # then PostgreSQL will not work via TCP,but will use Unix socket. # # You may also select database mode of word storage. # When "single" is specified,all words are stored in the same table. # If "multi" is selected,words will be located in different tables. # "multi" mode is usually faster but requires more tables. # Default mode is "single".
# DBAddrmysql://root:123456@localhost/mnogosearch/?dbmode=blob RemoteCharset utf-8
//目标数据库,你搜索的数据会存在这里
DBAddr mysql://root:123456@localhost/test1/?dbmode=single&setnames=utf8
//你要搜索那个数据库,只有在进行DBsearch的时候,才需要这句话配置 HTDBAddr mysql://root:123456@localhost/test2/?dbmode=single&setnames=utf8
// 当你使用DBsearch的时候,需要下面的设置
HTDBList "SELECT ID FROM tablename WHERE status = 'y' AND (tag <> '' OR name <> '' OR description <> '')" HTDBDoc "SELECTname,tag,description FROM tablename WHERE status = 'y' AND ID = $2 AND (tag <> '' OR name <> '' OR description <>'')"
Server htdb:/dbName/
// 在进行爬页面的时候需要配置 section ,也就是你需要爬页面那部分内容,也可以自定义,在最下面有注释
注:htdb search 需要配置这一块
####################################################################### # Document sections. # # Format is: # # Section <string> <number> <maxlen> [clone] [sep] [{expr} {repl}] # # where <string> is a section name and <number> is section ID # between 0 and 255. Use 0 if you don't want to index some of # these sections. It is better to use different sections IDs # for different documents parts. In this case during search # time you'll be able to give different weight to each part # or even disallow some sections at a search time. # <maxlen> argument contains a maximum length of section # which will be stored in database. # "clone" is an optional parameter describing whether this # section should affect clone detection. It can # be "DetectClone" or "cdon",or "NoDetectClone" or "cdoff". # By default,url.* section values are not taken in account # for clone detection,while any other sections take part # in clone detection. # "sep" is an optional argument to specify a separator between # parts of the same section. It is a space character by default. # "expr" and "repl" can be used to extract user defined sections, # for example pieces of text between the given tags. "expr" is # a regular expression,"repl" is a replacement with $1,$2,etc # meta-characters designating matches "expr" matches.
# Standard HTML sections: body,title
//body title 是每个页面都会有的内容,也就是标准页面部分。
Sectionbody1256 Section title2128
// 如果是htdb search 需要加入检索出的字段(而其他的选项入body、title则需要注释掉) ,例如下面的写法:
tag,name,description是需要被检索出来的关键字,
而NoSupported 的<number>选项被设置成 0 代表他不能是被搜索的关键字,也就是检索时搜索引擎不会搜索NoSupported 这个字段。
Section tag 1 128 Section name2 128 Section description3 1024
SectionNoSupported0 1024
# META tags # For example <META NAME="KEYWORDS" CONTENT="xxxx"> #
Section meta.keywords3128 Sectionmeta.description4128
# HTTP headers example,let's store "Server" HTTP header # # #Section header.server564
# Document's URL parts
Section url.file60 Section url.path70 Sectionurl.host80 Section url.proto90
# CrossWords
Section crosswords100
# # If you use CachedCopy for smart excerpts (see below), # please keep Charset section active. # Section Charset 11 32
Section Content-Type1264 Section Content-Language1316
# Uncomment the following lines if you want tag attributes # to be indexed
#Section attribute.alt14128 #Section attribute.label15128 #Section attribute.summary16128 #Section attribute.title17128 #Section attribute.face270
# Uncomment the following lines if you want use NewsExtensions # You may add any Newsgroups header to be indexed and stored in urlinfo table
#Section References180 #Section Message-ID190 #Section Parent-ID200
# Uncomment the following lines if you want index MP3 tags. #Section MP3.Song 21 128 #Section MP3.Album 22 128 #Section MP3.Artist 23 128 #Section MP3.Year 24 128
# Comment this line out if you don't want to store "cached copies" # to generate smart excerpts at search time. # Don't forget to keep "Charset" section active if you use cached copies. # NOTE: 3.2.18 has limits for CachedCopy size,32000 for Ibase and # 15000 for Mimer. Other databases do not have limits. # If indexer fails with 'string too long' error message then reduce # this number. This will be fixed in the future versions. # Section CachedCopy25 64000
# A user defined section example. # Extract text between <h1> and </h1> tags: #Section h126 128 "<h1>(.*)</h1>" $1
//这一部分是自定义爬页面的部分 只去爬页面的content 部分,也可以用正则表达式
Section content1 512 "<!--search start-->(.*)<!--search end-->" $1
//这一块没什么说的server section这部分配置将要爬的网站地址,可以是一个页面也可以是一个网站
注:htdb search 不需要配置这一块,注释掉就可以了。
######################################################################### #Server [Method] [SubSection] <URL> [alias] # This is the main command of the indexer.conf file. It's used # to describe web-space you want to index. It also inserts # given URL into database to use it as a start point. # You may use "Server" command as many times as a number of different # servers or their parts you want to index. # # "Method" is an optional parameter which can take on of the following values: # Allow,Disallow,CheckOnly,HrefOnly,CheckMP3,CheckMP3Only,Skip. # # "SubSection" is an optional parameter to specify server's subsection, # i.e. a part of Server command argument. # It can take the following values: # "page" describes web space which consists of one page with address <URL>. # "path" describes all documents which are under the same path with <URL>. # "site" describes all documents from the same host with <URL>. # "world" means "any document". # Default value is "path". # # To index whole server "localhost": #Server http://localhost/ # # You can also specify some path to index subdirectory only: #Server http://localhost/subdir/ # # To specify the only one page: #Server page http://localhost/path/main.html # # To index whole server but giving non-root page as a start point: #Server site http://localhost/path/main.html # # # You can also specify optional parameter "alias". This example will # index server "http://www.mnogosearch.org/" directly from disk instead of # fetching from HTTP server: #Server http://www.mnogosearch.org/ file:///home/httpd/www.mnogosearch.org/
配置到这里就可以进行基本的检索了,详细配置要需要参考mnogosearch的手册,相比较dpsearch,mnogosearch更适合搭建初级搜索引擎。 (编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|