f in x
PostgreSQL Full-Text Search: Operative Guide to tsvector, tsquery, and Ranking
> cd .. / HUB_EDITORIALE
Sviluppo di siti web

PostgreSQL Full-Text Search: Operative Guide to tsvector, tsquery, and Ranking

[2026-06-17] Author: Ing. Calogero Bono

Have you ever searched for a product in your database with WHERE name LIKE '%shoe%' and waited seconds, only to get incomplete results — missing "sneakers" when someone types "running shoe"? That's the classic problem of raw text search in relational databases: LIKE is slow, ignores word variations, stop words, and relevance ranking. At Meteora Web, we've seen this in dozens of projects. The solution is PostgreSQL's built‑in Full‑Text Search, powered by tsvector and tsquery. It lets you implement advanced search without installing Elasticsearch or Solr. In this guide, we'll show you how it works, how to configure it, and how to fine‑tune ranking for the best results.

Why LIKE is not enough

When a customer searches for "red waterproof jacket" on your e‑commerce site, they expect to find "waterproof jackets", "red raincoat", and "water resistant jacket" too. LIKE works at the character level, not the word level: it doesn't understand that "jacket" and "jackets" share the same root, it doesn't handle stop words ("the", "a", "and"), and it can't rank results by relevance. On large tables (hundreds of thousands of rows), a LIKE query with a leading wildcard forces a full table scan, killing performance.

Sponsored Protocol

PostgreSQL has had a mature full‑text search engine since around 2005. We use it in production for clients with catalogs of over 200,000 products, and response times stay under 200 ms with proper indexing.

tsvector and tsquery: the building blocks

What is a tsvector

A tsvector is a normalized representation of text: it splits the content into lexemes (word stems) after applying stemming and removing stop words. Each lexeme is associated with its positions in the original text, which is used for proximity searches.

Example with to_tsvector:

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Notice: "the", "over" are stop words removed; "jumps" → "jump", "lazy" → "lazi".

What is a tsquery

A tsquery is the search expression normalized the same way. Use boolean operators ( & for AND, | for OR, ! for NOT) and the :* prefix for prefix matching.

SELECT to_tsquery('english', 'fox & (quick | lazy)');
-- Result: 'fox' & ( 'quick' | 'lazi' )

Use plainto_tsquery for natural language queries (turns "fox quick lazy" into 'fox' & 'quick' & 'lazi'), or phraseto_tsquery for exact phrase search.

Sponsored Protocol

Basic matching

SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'fox & quick');

This works, but it's not performant: it computes the tsvector on the fly for every row. You need an index.

GIN index: the speed secret

PostgreSQL offers GiST and GIN indexes for full‑text. GIN is almost always the better choice: faster for search, slower for writes, but unbeatable for read workloads.

-- Add a generated tsvector column (auto‑maintained)
ALTER TABLE products ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''))) STORED;

-- Create the GIN index
CREATE INDEX idx_products_search ON products USING GIN (search_vector);

Now the search becomes:

SELECT id, name, description,
       ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Note: If your data changes frequently, the GENERATED ALWAYS column is transparently updated. For bulk updates, consider a trigger or batch refresh.

Sponsored Protocol

Ranking: how to order by relevance

ts_rank and ts_rank_cd

ts_rank computes a score based on term frequency and proximity (how often terms appear and how close they are). ts_rank_cd uses document coverage (how much of the document is covered by the query terms).

To improve ranking quality, use weights on different columns. For instance, give more importance to the title than the description:

ALTER TABLE products ADD COLUMN search_vector_weighted tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name,'')), 'A') ||
    setweight(to_tsvector('english', coalesce(description,'')), 'B')
  ) STORED;

CREATE INDEX idx_products_weighted ON products USING GIN (search_vector_weighted);

SELECT id, name, description,
       ts_rank(search_vector_weighted, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector_weighted @@ query
ORDER BY rank DESC
LIMIT 20;

Weights range from A (highest) to D (lowest). setweight assigns a weight to all lexemes in that vector. ts_rank takes them into account during normalization.

Normalization options

ts_rank accepts a second integer parameter for the normalization method (document length, coverage, etc.). Default (0) accounts for length. We recommend 32 (normalize by document length divided by average) for homogeneous catalogs, or 4 (term count only) if you want absolute weight. Experiment.

Sponsored Protocol

SELECT ts_rank(search_vector_weighted, query, 32) AS rank ...

Advanced use: fuzzy search and synonyms

Prefix matching and wildcards

Use :* to match any word starting with that prefix:

SELECT to_tsquery('english', 'run:*');
-- matches 'run', 'running', 'runner', etc.

Synonyms with thesaurus

PostgreSQL supports thesaurus files for synonym mapping. Create a mapping file (e.g., english.ths) and load it. Useful for mapping "car" → "automobile" or "PC" → "computer". Configuration is complex but powerful for domain‑specific vocabularies.

Phonetic matching

If you need to handle typos, native full‑text won't suffice. Combine it with the pg_trgm extension (trigrams) for similarity search using a GiST index. That's a different topic.

Common mistakes and how to avoid them

  • Not indexing the computed column: if you compute tsvector on the fly in the query, you lose all performance benefits. Always use a materialized column + GIN index.
  • Forgetting the language configuration: always specify the correct language code ('english', 'italian', etc.). If omitted, PostgreSQL defaults to 'english'.
  • Custom stop words: if your domain uses words that the default dictionary treats as stop words (e.g., "type" in a fashion context?), create a custom dictionary.
  • Ignoring ranking normalization: very long or very short documents can distort the order. Try different normalization values.

In summary — what to do now

  1. Add a tsvector column to your searchable table (products, articles, documents).
  2. Create a GIN index on that column.
  3. Use to_tsquery or plainto_tsquery to transform user input into a tsquery.
  4. Order by ts_rank with weights and appropriate normalization.
  5. Test with real data: compare query times before and after using EXPLAIN ANALYZE.

For a deeper dive, check our pillar guide on Advanced PostgreSQL where we cover performance, JSONB, and replication. For a complete approach to Linux server administration, see our Linux guide for developers and sysadmins.

Sponsored Protocol

We use these techniques daily for our clients. They work. Try them yourself.

Ing. Calogero Bono

> AUTHOR_EXTRACTED

Ing. Calogero Bono

Ingegnere Informatico, co-fondatore di Meteora Web. Esperto in architetture software, sicurezza informatica e sviluppo sistemi scalabili.
[ Read Full Dossier ]

> METEORA_WEB // DIGITAL AGENCY

We build the digital presence your business deserves.

Websites, social media, online advertising, e-commerce and high-performance hosting, engineered with method by computer engineers in Sciacca, for all of Italy.

> MW_JOURNAL

> READ_ALL()