PostgreSQL Full-Text Search: tsvector, tsquery, and Ranking Guide • Meteora Web Agency

Have you ever searched for a product in your database with WHERE name LIKE '%shoe%' and waited seconds, only to get incomplete results — missing "sneakers" when someone types "running shoe"? That's the classic problem of raw text search in relational databases: LIKE is slow, ignores word variations, stop words, and relevance ranking. At Meteora Web, we've seen this in dozens of projects. The solution is PostgreSQL's built‑in Full‑Text Search, powered by tsvector and tsquery. It lets you implement advanced search without installing Elasticsearch or Solr. In this guide, we'll show you how it works, how to configure it, and how to fine‑tune ranking for the best results.

Why LIKE is not enough

When a customer searches for "red waterproof jacket" on your e‑commerce site, they expect to find "waterproof jackets", "red raincoat", and "water resistant jacket" too. LIKE works at the character level, not the word level: it doesn't understand that "jacket" and "jackets" share the same root, it doesn't handle stop words ("the", "a", "and"), and it can't rank results by relevance. On large tables (hundreds of thousands of rows), a LIKE query with a leading wildcard forces a full table scan, killing performance.

PostgreSQL has had a mature full‑text search engine since around 2005. We use it in production for clients with catalogs of over 200,000 products, and response times stay under 200 ms with proper indexing.

tsvector and tsquery: the building blocks

What is a tsvector

A tsvector is a normalized representation of text: it splits the content into lexemes (word stems) after applying stemming and removing stop words. Each lexeme is associated with its positions in the original text, which is used for proximity searches.

Example with to_tsvector:

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Notice: "the", "over" are stop words removed; "jumps" → "jump", "lazy" → "lazi".

What is a tsquery

A tsquery is the search expression normalized the same way. Use boolean operators ( & for AND, | for OR, ! for NOT) and the :* prefix for prefix matching.

SELECT to_tsquery('english', 'fox & (quick | lazy)');
-- Result: 'fox' & ( 'quick' | 'lazi' )

Use plainto_tsquery for natural language queries (turns "fox quick lazy" into 'fox' & 'quick' & 'lazi'), or phraseto_tsquery for exact phrase search.

Basic matching

SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'fox & quick');

This works, but it's not performant: it computes the tsvector on the fly for every row. You need an index.

GIN index: the speed secret

PostgreSQL offers GiST and GIN indexes for full‑text. GIN is almost always the better choice: faster for search, slower for writes, but unbeatable for read workloads.

-- Add a generated tsvector column (auto‑maintained)
ALTER TABLE products ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''))) STORED;

-- Create the GIN index
CREATE INDEX idx_products_search ON products USING GIN (search_vector);

Now the search becomes:

SELECT id, name, description,
       ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Note: If your data changes frequently, the GENERATED ALWAYS column is transparently updated. For bulk updates, consider a trigger or batch refresh.

Ranking: how to order by relevance

ts_rank and ts_rank_cd

ts_rank computes a score based on term frequency and proximity (how often terms appear and how close they are). ts_rank_cd uses document coverage (how much of the document is covered by the query terms).

To improve ranking quality, use weights on different columns. For instance, give more importance to the title than the description:

ALTER TABLE products ADD COLUMN search_vector_weighted tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name,'')), 'A') ||
    setweight(to_tsvector('english', coalesce(description,'')), 'B')
  ) STORED;

CREATE INDEX idx_products_weighted ON products USING GIN (search_vector_weighted);

SELECT id, name, description,
       ts_rank(search_vector_weighted, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector_weighted @@ query
ORDER BY rank DESC
LIMIT 20;

Weights range from A (highest) to D (lowest). setweight assigns a weight to all lexemes in that vector. ts_rank takes them into account during normalization.

Normalization options

ts_rank accepts a second integer parameter for the normalization method (document length, coverage, etc.). Default (0) accounts for length. We recommend 32 (normalize by document length divided by average) for homogeneous catalogs, or 4 (term count only) if you want absolute weight. Experiment.

SELECT ts_rank(search_vector_weighted, query, 32) AS rank ...

Advanced use: fuzzy search and synonyms

Prefix matching and wildcards

Use :* to match any word starting with that prefix:

SELECT to_tsquery('english', 'run:*');
-- matches 'run', 'running', 'runner', etc.

Synonyms with thesaurus

PostgreSQL supports thesaurus files for synonym mapping. Create a mapping file (e.g., english.ths) and load it. Useful for mapping "car" → "automobile" or "PC" → "computer". Configuration is complex but powerful for domain‑specific vocabularies.

Phonetic matching

If you need to handle typos, native full‑text won't suffice. Combine it with the pg_trgm extension (trigrams) for similarity search using a GiST index. That's a different topic.

Common mistakes and how to avoid them

Not indexing the computed column: if you compute tsvector on the fly in the query, you lose all performance benefits. Always use a materialized column + GIN index.
Forgetting the language configuration: always specify the correct language code ('english', 'italian', etc.). If omitted, PostgreSQL defaults to 'english'.
Custom stop words: if your domain uses words that the default dictionary treats as stop words (e.g., "type" in a fashion context?), create a custom dictionary.
Ignoring ranking normalization: very long or very short documents can distort the order. Try different normalization values.

In summary — what to do now

Add a tsvector column to your searchable table (products, articles, documents).
Create a GIN index on that column.
Use to_tsquery or plainto_tsquery to transform user input into a tsquery.
Order by ts_rank with weights and appropriate normalization.
Test with real data: compare query times before and after using EXPLAIN ANALYZE.

For a deeper dive, check our pillar guide on Advanced PostgreSQL where we cover performance, JSONB, and replication. For a complete approach to Linux server administration, see our Linux guide for developers and sysadmins.

We use these techniques daily for our clients. They work. Try them yourself.

PostgreSQL Full-Text Search: Operative Guide to tsvector, tsquery, and Ranking

Why LIKE is not enough

tsvector and tsquery: the building blocks

What is a tsvector

What is a tsquery

Basic matching

GIN index: the speed secret

Ranking: how to order by relevance

ts_rank and ts_rank_cd

Normalization options

Advanced use: fuzzy search and synonyms

Prefix matching and wildcards

Synonyms with thesaurus

Phonetic matching

Common mistakes and how to avoid them

In summary — what to do now

> AUTHOR_EXTRACTED

Ing. Calogero Bono

We build the digital presence your business deserves.

Stay in the loop

> MW_JOURNAL LATEST_LOGS

Pixel 11's flashy light show masks Google's failure to improve face unlock

Asics running shoes are up to 53 percent off on Amazon with 15 top models

77% of adults now support classroom phone ban, new Pew poll finds

Montana's 'right to try' moves fast — Europe is still standing still

Pokémon Winds & Waves leak confirms return of Legends Arceus-style free catching