-
Abstract -
Technology selection -
1.1 ElasticSearch -
1.2 springBoot -
1.3 ik tokenizer -
2 Environment preparation -
3 Project architecture -
4 Implementation effect5 -
client configuration -
5.3 Business Code Writing5.4 -
External Interface5.5 -
Page -
6 Summary
1
Specific code implementation
excerpt To
For a company, the
amount of data is increasing, if it is a difficult problem to quickly find this information, there is a special field in the computer field IR (Information Retrival) research on how to obtain information, do information retrieval. Search engines such as Baidu in China also belong to this field, it is very difficult to implement a search engine by yourself, but information search is very important for every company, for developers can also choose some open source projects on the market to build their own web search engine, this article will build such an information retrieval project through ElasticSearch.
1 Technology selection
-
search engine service uses ElasticSearch -
The external web service provided is springboot web
1.1 ElasticSearch
Elasticsearch is a Lucene-based search server. It provides a distributed, multi-user capability full-text search engine based on a RESTful web interface. Developed in Java and released as open source under the Apache license terms, Elasticsearch is a popular enterprise search engine. Elasticsearch is used in cloud computing, which can achieve real-time search, stable, reliable, fast, easy to install and use.
The official client is available in Java, . NET (C#), PHP, Python, Apache Groovy, Ruby, and many other languages are available. According to DB-Engines’ rankings, Elasticsearch is the most popular enterprise search engine, followed by Apache Solr, also based on Lucene. 1
Now the most common open source search engine on the market is ElasticSearch and Solr, both of which are based on Lucene implementations, of which ElasticSearch is relatively more heavyweight, and performs better in distributed environments, and the choice of the two needs to consider specific business scenarios and data magnitude. For the small amount of data, it is entirely necessary to use a search engine service like Lucene, through a relational database search.
1.2 springBoot
Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that You can “just run”.2
Now springBoot is the absolute mainstream in web development, not only its development advantages, but also in deployment, operation and maintenance in all aspects have a very good performance, and the influence of the spring ecosystem is too great, you can find a variety of mature solutions.
1.3 ik tokenizer
elasticSearch itself does not support Chinese word segmentation, you need to install Chinese word segmentation plugin,
if you need to do Chinese information retrieval, Chinese word segmentation is the basis, here is selected ik, download and put it into the plugin directory of the installation location of elasticSearch.
2 Environment preparation
You need to
install elastiSearch and kibana (optional), and you need to lk word segmentation plugin.
-
install elasticSearch elasticsearch official website. -
IK plugin download IK plugin github address. Note that you download the same IK plugin as you downloaded the ElasticSearch version. -
Put the IK plugin into the plugins package in the elasticsearch installation directory, create a new registration ik, extract the downloaded plugin to this directory, and the plugin will be automatically loaded when you start ES.

-
build the springboot project idea->new project->spring initializer
3 Project Architecture
-
to get data using the ik word breaker -
Store data in -
retrieve the stored data through ES retrieval using -
the ES Java client to provide external services
the ES engine to
4 Achieve the effect
4.1 The search page
can simply implement a Baidu-like search box.
4.2 Search results page
< img src="https://mmbiz.qpic.cn/mmbiz_png/1flHOHZw6RuSScUib3mtpsxeUVa3BXKXMQEJD6g1GGwoydVcYBicpKHDovfFvUYUpOib6s8venq2OEoE54ky0n04Q/640?wx_fmt=png" >
Click on the first search result is one of my personal blog posts, in order to avoid data copyright problems, the author in the ES engine is all personal blog data.
. 5
Concrete code implementation 5.1 Full-text index implementation object
According to the basic information of the blog post, the following entity classes are defined, which mainly need to know the URL of each blog post, and jump to the URL by specifically viewing the retrieved article.
package com.lbh.es.entity; import com.fasterxml.jackson.annotation.JsonIgnore;
import javax.persistence.*;
/** * PUT articles * { * "mappings": * {"properties":{ * "author":{"type":"text"}, * "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * " title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * "createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss|| yyyy-MM-dd"}, * "url":{"type":"text"} * } }, * "settings":{ * "index":{ * "number_of_shards":1, * "number_of_replicas":2 * } * } * } * --------------------------------------------------------------------------------------------------------------------- * Copyright(c)lbhbinhao@163.com * @author liubinhao
* @date 2021/3/3
*/
@Entity
@Table(name = "es_article")
public class ArticleEntity {
@Id
@JsonIgnore
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long id;
@Column(name = "author")
private String author;
@Column(name = "content",columnDefinition="TEXT")
private String content;
@Column(name = "title")
private String title;
@Column(name = "createDate")
private String createDate;
@Column(name = "url")
private String url; public String getAuthor() {
return author; } public void setAuthor(String author) {
this.author = author; } public String getContent() {
return content; } public void setContent(String content) {
this.content = content; } public String getTitle() {
return title; } public void setTitle(String title) {
this.title = title; } public String getCreateDate() {
return createDate; } public void setCreateDate(String createDate) {
this.createDate = createDate; } public String getUrl() {
return url; } public void setUrl(String url) {
this.url = url; }}
5.2 Client Configuration
Clients that configure ES via Java.
package com.lbh.es.config; import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration; import java.util.ArrayList;
import java.util.List; /** * Copyright(c)lbhbinhao@163.com * @author liubinhao
* @date 2021/3/3
*/
@Configuration
public class EsConfig { @Value("${elasticsearch.schema}")
private String schema;
@Value("${elasticsearch.address}")
private String address;
@Value("${elasticsearch.connectTimeout}")
private int connectTimeout;
@Value("${elasticsearch.socketTimeout}")
private int socketTimeout;
@Value("${elasticsearch.connectionRequestTimeout}")
private int tryConnTimeout;
@Value("${elasticsearch.maxConnectNum}")
private int maxConnNum;
@Value("${elasticsearch.maxConnectPerRoute}")
private int maxConnectPerRoute; @Bean
public RestHighLevelClient restHighLevelClient() {
split address
List hostLists = new ArrayList<>();
String[] hostList = address.split(",");
for (String addr : hostList) {
String host = addr.split(":")[0];
String port = addr.split(":")[1];
hostLists.add(new HttpHost(host, Integer.parseInt(port), schema)); } convert to HttpHost array
HttpHost[]
httpHost = hostLists.toArray(new HttpHost[]{});
Build the connection object RestClientBuilder builder = RestClient.builder(httpHost); async connection delay configuration
builder.setRequestConfigCallback(requestConfigBuilder -> { requestConfigBuilder.setConnectTimeout(connectTimeout); requestConfigBuilder.setSocketTimeout(socketTimeout); requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout); return requestConfigBuilder;
}); Number of asynchronous connections configuration
builder.setHttpClientConfigCallback(httpClientBuilder -> { httpClientBuilder.setMaxConnTotal(maxConnNum); httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute); return httpClientBuilder;
}); return new RestHighLevelClient(builder);
}}
5.3 Business code writing
includes some information to retrieve the article, and you can view the relevant information from the dimensions of article title, article content, and author information.
package com.lbh.es.service; import com.google.gson.Gson;
import com.lbh.es.entity.ArticleEntity;
import com.lbh.es.repository.ArticleRepository;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.stereotype.Service; import javax.annotation.Resource;
import java.io.IOException; import java.util.*;
/** * Copyright(c)lbhbinhao@163.com * @author liubinhao
* @date 2021/3/3
*/
@Service
public class ArticleService { private static final String ARTICLE_INDEX = "article ";
@Resource
private RestHighLevelClient client;
@Resource
private ArticleRepository articleRepository; public boolean createIndexOfArticle(){
Settings settings = Settings.builder() .put( "index.number_of_shards", 1)
.put("index.number_of_replicas", 1) .build(); // {"properties":{"author":{"type":"text"},
// "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}
,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
// ,"createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss|| yyyy-MM-dd"}
// }
String mapping = "{\"properties\":{\"author\":{\"type\":\"text\"},\n" +
"\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +
",\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +
",\"createDate\":{\"type\":\"date\",\"format\":\"yyyy-MM-dd HH:mm:ss|| yyyy-MM-dd\"}\n" +
"},\"url\":{\"type\":\"text\"}\n" +
"}";
CreateIndexRequest indexRequest = new CreateIndexRequest(ARTICLE_INDEX) .settings(settings).mapping(mapping,XContentType.JSON); CreateIndexResponse response = null;
try { response = client.indices().create(indexRequest, RequestOptions.DEFAULT); } catch (IOException e) {
e.printStackTrace(); } if (response!=null) {
System.err.println(response.isAcknowledged() ? "success" : "default");
return response.isAcknowledged();
} else {
return false; } } public boolean deleteArticle(){
DeleteIndexRequest request = new DeleteIndexRequest(ARTICLE_INDEX);
try { AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT); return response.isAcknowledged();
} catch (IOException e) { e.printStackTrace(); } return false;
} public IndexResponse addArticle(ArticleEntity article){
Gson gson = new Gson(); String s = gson.toJson(article); creating an indexcreating an object
IndexRequest indexRequest = new IndexRequest(ARTICLE_INDEX);
Document content indexRequest.source(s,XContentType.JSON); http requests via client
IndexResponse re = null;
try { re = client.index(indexRequest, RequestOptions.DEFAULT); } catch (IOException e) {
e.printStackTrace(); } return re;
} public void transferFromMysql(){
articleRepository.findAll().forEach(this:: addArticle); } public List queryByKey(String keyword) {
SearchRequest request = new SearchRequest( ); /* * *
Create Search Content Parameter Setting Object: SearchSourceBuilder * Compared to matchQuery, multiMatchQuery targets multiple fi eld, that is, when there is only one fieldNames parameter in multiMatchQuery, its effect is equivalent to matchQuery; * When fieldNames has multiple parameters, such as field1 and field2, the query result will either contain text in field1 or field2 will contain text. */
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders .multiMatchQuery(keyword, "author","content","title"));
request.source(searchSourceBuilder); List result = new ArrayList<>();
try { SearchResponse search = client.search(request, RequestOptions.DEFAULT); for (SearchHit hit:search.getHits()){
Map map = hit.getSourceAsMap(); ArticleEntity item = new ArticleEntity();
item.setAuthor((String) map.get("author"));
item.setContent((String) map.get("content"));
item.setTitle((String) map.get("title"));
item.setUrl((String) map.get("url")); result.add(item); } return result;
} catch (IOException e) { e.printStackTrace(); } return null;
} public ArticleEntity queryById(String indexId){
GetRequest request = new GetRequest(ARTICLE_ INDEX, indexId);
GetResponse response = null;
try { response = client.get(request, RequestOptions.DEFAULT); } catch (IOException e) {
e.printStackTrace(); } if (response!=null&&response.isExists()){
Gson gson = new Gson();
return gson.fromJson(response.getSourceAsString(),ArticleEntity. class); } return null;
}}
5.4 The external interface
is the same as developing web applications with springboot.
package com.lbh.es.controller; import com.lbh.es.entity.ArticleEntity;
import com.lbh.es.service.ArticleService;
import org.elasticsearch.action.index.IndexResponse;
import org.springframework.web.bind.annotation.*; import javax.annotation.Resource;
import java.util.List; /** * Copyright(c)lbhbinhao@163.com * @author liubinhao
* @date 2021/3/3
*/
@RestController
@RequestMapping("article")
public class ArticleController { @Resource
private ArticleService articleService; @GetMapping("/create")
public boolean create(){
return articleService.createIndexOfArticle(); } @GetMapping("/delete")
public boolean delete() {
return articleService.deleteArticle(); } @PostMapping("/add")
public IndexResponse add(@RequestBody ArticleEntity article) {
return articleService.addArticle(article); } @GetMapping("/fransfer")
public String transfer(){ articleService.transferFromMysql(); return "successful";
} @GetMapping("/query")
public List query(String keyword) {
return articleService.queryByKey(keyword); }}
5.5 page The main reason why the page here uses thymeleaf, the main reason is that the author really does not know the front end, only knows how to throw away the simple h5, so I casually made a page
that can be displayed.
Search page
<!DOCTYPE html><
html lang="en" xmlns:th="http://www.thymeleaf.org"> " UTF-8
" />
/> YiyiDu
containing input and button
"font-size: 0px;" >
"center" style="margin-top: 0px;" >
".. /static/img/yyd.png" th:src = "@{/static/img/yyd.png}" alt="100 million degrees" width="280px" class="pic" />