Postgres Full-Text Search With Hibernate 6

Hibernate

Hibernate by itself does not have full-text search support. It has to rely on database engine support or third-party solutions.

An extension called Hibernate Search integrates with Apache Lucene or Elasticsearch (there is also integration with OpenSearch).

Postgres

Postgres has had full-text search functionality since version 7.3. Although it can not compete with search engines like Elasticsearch or Lucene, it still provides a flexible and robust solution that might be enough to meet application users’ expectations—features like stemming, ranking, and indexing.

We will briefly explain how we can do a full-text search in Postgres. For more, please visit Postgres documentation. As for essential text matching, the most crucial part is the math operator @@.

It returns true if the document (object of type tsvector) matches the query (object of type tsquery).

The order is not crucial for the operator. So, it does not matter if we put the document on the left side of the operator and the query on the right side or in a different order.

For better demonstration, we use a database table called the tweet.

SQL

 

create table tweet (
        id bigint not null,
        short_content varchar(255),
        title varchar(255),
        primary key (id)
    )

With such data:

SQL

 

INSERT INTO tweet (id, title, short_content) VALUES (1, 'Cats', 'Cats rules the world');
INSERT INTO tweet (id, title, short_content) VALUES (2, 'Rats', 'Rats rules in the sewers');
INSERT INTO tweet (id, title, short_content) VALUES (3, 'Rats vs Cats', 'Rats and Cats hates each other');

INSERT INTO tweet (id, title, short_content) VALUES (4, 'Feature', 'This project is design to wrap already existed functions of Postgres');
INSERT INTO tweet (id, title, short_content) VALUES (5, 'Postgres database', 'Postgres is one of the widly used database on the market');
INSERT INTO tweet (id, title, short_content) VALUES (6, 'Database', 'On the market there is a lot of database that have similar features like Oracle');

Now let’s see what the tsvector object looks like for the short_content column for each of the records.

SQL

 

SELECT id, to_tsvector('english', short_content) FROM tweet;

Output:

Output: to_tsvector

The output shows how to_tsvcector converts the text column to a tsvector object for the ‘english‘ text search configuration.

Text Search Configuration

The first parameter for the to_tsvector function passed in the above example was the name of the text search configuration. In that case, it was the “english“. According to Postgres documentation, the text search configuration is as follows:

… full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled by text search configurations.

So, configuration is a crucial part of the process and vital to our full-text search results. For different configurations, the Postgres engine can return different results. This does not have to be the case among dictionaries for different languages. For example, you can have two configurations for the same language, but one ignores names containing digits (for example, some serial numbers). If we pass in our query the specific serial number we are looking for, which is mandatory, we won’t find any record for configuration that ignores words with numbers. Even if we have such records in the database, please check the configuration documentation for more information.

Text Query

Text query supports such operators as & (AND), | (OR), ! (NOT), and <-> (FOLLOWED BY). The first three operators do not require a deeper explanation. The <-> operator checks if words exist and if they are placed in a specific order. So, for example, for the query “rat <-> cat“, we expect that the “cat” word is going to exist, followed by the “rat.”

Examples

  • Content that contains the rat and cat:
SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Rat & cat');

Output: Rats and Cats hate each other

  • Content that contains database and market, and the market is the third word after database:
SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database <3> market');

Output: Postgres is one of the widely used databases on the market

  • Content that contains database but not Postgres:
SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database & !Postgres');

Output: On the market there is a lot of database that have similar features like Oracle

  • Content that contains Postgres or Oracle:
SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Postgres | Oracle');

Output: character varying

Wrapper Functions

One of the wrapper functions that creates text queries was already mentioned in this article, which is the to_tsquery. There are more such functions like:

  • plainto_tsquery
  • phraseto_tsquery
  • websearch_to_tsquery

plainto_tsquery

The plainto_tsquery converts all passed words to query where all words are combined with the & (AND) operator. For example, the equivalent of the plainto_tsquery('english', 'Rat cat') is to_tsquery('english', 'Rat & cat').

For the following usage:

SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ plainto_tsquery('english', 'Rat cat');

We get the result below:

Output from plainto_tsquery

phraseto_tsquery

The phraseto_tsquery converts all passed words to query where all words are combined with <-> (FOLLOW BY) operator. For example, the equivalent of the phraseto_tsquery('english', 'cat rule') is to_tsquery('english', 'cat <-> rule').

For the following usage:

SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ phraseto_tsquery('english', 'cat rule');

We get the result below:

Output: Cats rule the world

websearch_to_tsquery

The websearch_to_tsquery uses alternative syntax to create a valid text query. 

  • Unquoted text: Converts part of syntax in the same way as plainto_tsquery
  • Quoted text: Converts part of syntax in the same way as phraseto_tsquery
  • OR: Converts to “|” (OR) operator
  • -“: Same as “!” (NOT) operator

For example, the equivalent of the websearch_to_tsquery('english', '"cat rule" or database -Postgres') is to_tsquery('english', 'cat <-> rule | database & !Postgres').

For the following usage:

SQL

 

SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ websearch_to_tsquery('english', '"cat rule" or database -Postgres');

We get the result below:

Output: short_content, character varying

Postgres and Hibernate Native Support

As mentioned in the article, Hibernate alone does not have full-text search support. It has to rely on database engine support. This means that we are allowed to execute native SQL queries as shown in the examples below:

  • plainto_tsquery
Java

 

public List<Tweet> findBySinglePlainQueryInDescriptionForConfigurationWithNativeSQL(String textQuery, String configuration) {
        return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ plainto_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
    }

  • websearch_to_tsquery
Java

 

public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescriptionWithNativeSQL(String textQuery, String configuration) {
        return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ websearch_to_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList();
    }

Hibernate With posjsonhelper Library

The posjsonhelper library is an open-source project that adds support for Hibernate queries for PostgreSQL JSON functions and full-text search.

For the Maven project, we need to add the dependencies below:

XML

 

<dependency>
    <groupId>com.github.starnowski.posjsonhelper.text</groupId>
    <artifactId>hibernate6-text</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>6.4.0.Final</version>
</dependency>

To use components that exist in the posjsonhelper library, we need to register them in the Hibernate context.

This means that there must be a specified org.hibernate.boot.model.FunctionContributor implementation. The library has an implementation of this interface, that is com.github.starnowski.posjsonhelper.hibernate6.PosjsonhelperFunctionContributor.

A file with the name “org.hibernate.boot.model.FunctionContributor” under the “resources/META-INF/services” directory is required to use this implementation.

There is another way to register posjsonhelper‘s component, which can be done through programmability. To see how to do that, check this link.

Now, we can use full-text search operators in Hibernate queries.

PlainToTSQueryFunction

This is a component that wraps the plainto_tsquery function.

Java

 

public List<Tweet> findBySinglePlainQueryInDescriptionForConfiguration(String textQuery, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PlainToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
        return entityManager.createQuery(query).getResultList();
    }

For a configuration with the value 'english', the code is going to generate the statement below:

Java

 

select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ plainto_tsquery('english', ?);

PhraseToTSQueryFunction 

This component wraps the phraseto_tsquery function.

Java

 

public List<Tweet> findBySinglePhraseInDescriptionForConfiguration(String textQuery, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PhraseToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext));
        return entityManager.createQuery(query).getResultList();
        }

For configuration with the value 'english', the code is going to generate the statement below:

SQL

 

select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ phraseto_tsquery('english', ?)

WebsearchToTSQueryFunction 

This component wraps the websearch_to_tsquery function.

Java

 

public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescription(String phrase, String configuration) {
        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class);
        Root<Tweet> root = query.from(Tweet.class);
        query.select(root);
        query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new WebsearchToTSQueryFunction((NodeBuilder) cb, configuration, phrase), hibernateContext));
        return entityManager.createQuery(query).getResultList();
    }

For configuration with the value 'english', the code is going to generate the statement below:

SQL

 

select
        t1_0.id,
        t1_0.short_content,
        t1_0.title 
    from
        tweet t1_0 
    where
        to_tsvector('english', t1_0.short_content) @@ websearch_to_tsquery('english', ?)

HQL Queries

All mentioned components can be used in HQL queries. To check how it can be done, please click this link.

Why Use the posjsonhelper Library When We Can Use the Native Approach With Hibernate?

Although dynamically concatenating a string that is supposed to be an HQL or SQL query might be easy, implementing predicates would be better practice, especially when you have to handle search criteria based on dynamic attributes from your API.

Conclusion

As mentioned in the previous article, Postgres full-text search support can be a good alternative for substantial search engines like Elasticsearch or Lucene, in some cases. This could save us from the decision to add third-party solutions to our technology stack, which could also add more complexity and additional costs.

Source:
https://dzone.com/articles/postgres-full-text-search-with-hibernate-6