Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等应用场景,用户可以免费下载使用。
coreseek安装需要预装的软件:yum install make gcc g++ gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel
cd /usr/local/srcwget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gztar -xzvf coreseek-3.2.14.tar.gzcd coreseek-3.2.14##安装mmsegcd mmseg-3.2.14./bootstrap #输出的warning信息可以忽略,如果出现error则需要解决./configure --prefix=/usr/local/mmseg3make && make installcd ..## 安装完成后,mmseg使用的词典和配置文件将自动安装到/usr/local/mmseg3/etc中##安装coreseekcd csft-3.2.14sh buildconf.sh #输出的warning信息可以忽略,如果出现error则需要解决./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql ##如果提示mysql问题,可以查看MySQL数据源安装说明make && make installcd ..cd /usr/local/coreseek/etccp sphinx-min.conf.dist sphinx.confvi sphinx.conf内容示例如下(localhost,DB_USER,DB_PASSWORD,DB_NAME自行修改)## Minimal Sphinx configuration sample (clean, simple, functional)#source content{ type = mysql sql_host = localhost sql_user = DB_USER sql_pass = DB_PASSWORD sql_db = DB_NAME sql_port = 3306 # optional, default is 3306 sql_query_pre = SET NAMES utf8 sql_query = \ SELECT id, title, pub_time, group_id, content FROM contents where status = '1' sql_attr_uint = group_id sql_attr_timestamp = pub_time sql_query_info = SELECT * FROM contents WHERE id=$id}index content{ source = content path = /usr/local/coreseek/var/data/content docinfo = extern charset_dictpath = /usr/local/mmseg3/etc/ charset_type = zh_cn.utf-8 ngram_len = 0}indexer{ mem_limit = 32M}searchd{ port = 9312 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1}
然后根据以上配置建立索引文件
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf --all --rotate
启动命令
/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf
然后在coreseek目录下,新建3个sh脚本,以便操作
停止服务stop.sh
#!/bin/bash/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf --stop
建立索引build.sh
#!/bin/bash/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf --all --rotate
启动服务start.sh
#!/bin/bash/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf
添加可执行权限
chmod +x start.shchmod +x stop.shchmod +x build.sh
运行start.sh后,使用crontab定时执行build.sh,就可更新索引。(注:因为数据量小且更新不算很频繁,未使用增量索引,只是定时重建主索引,新版本CoreSeek全文搜索 4.1 支持实时索引)
crontab -e0 2 * * * sh /usr/local/coreseek/build.sh >/dev/null 2>&1
每天凌晨2点重建一次索引,忽略日志输出。
在/usr/local/src/coreseek.3.2.14/csft-3.2.14/api目录下提供了PHP的接口文件 sphinxapi.php,这个文件包含一个SphinxClient的类,copy到自己的web目录下
通过如下方式进行搜索
$s_key = trim($s_key);if(strpos($s_key,'\'') || strpos($s_key,'\"') || strpos($s_key,'\;')) { exit('非法字符');}require("sphinxapi.php");$page_nums = 20;$offset_start = ($page_index-1)*$page_nums;$offset_end = $offset_start + $page_nums;$cl = new SphinxClient();$cl->SetServer('localhost', 9312);$cl->SetArrayResult(true);$cl->SetMatchMode(SPH_MATCH_ALL);$cl->SetLimits($offset_start,$offset_end);$cl->SetSortMode(SPH_SORT_RELEVANCE);$res = $cl->Query($s_key,"content");