Have you ever searched for a product in your database with WHERE name LIKE '%shoe%' and waited seconds, only to get incomplete results — missing "sneakers" when someone types "running shoe"? That's the classic problem of raw text search in relational databases: LIKE is slow, ignores word variations, stop words, and relevance ranking. At Meteora Web, we've seen this in dozens of projects. The solution is PostgreSQL's built‑in Full‑Text Search, powered by tsvector and tsquery. It lets you implement advanced search without installing Elasticsearch or Solr. In this guide, we'll show you how it works, how to configure it, and how to fine‑tune ranking for the best results.
Why LIKE is not enough
When a customer searches for "red waterproof jacket" on your e‑commerce site, they expect to find "waterproof jackets", "red raincoat", and "water resistant jacket" too. LIKE works at the character level, not the word level: it doesn't understand that "jacket" and "jackets" share the same root, it doesn't handle stop words ("the", "a", "and"), and it can't rank results by relevance. On large tables (hundreds of thousands of rows), a LIKE query with a leading wildcard forces a full table scan, killing performance.
Sponsored Protocol
PostgreSQL has had a mature full‑text search engine since around 2005. We use it in production for clients with catalogs of over 200,000 products, and response times stay under 200 ms with proper indexing.
tsvector and tsquery: the building blocks
What is a tsvector
A tsvector is a normalized representation of text: it splits the content into lexemes (word stems) after applying stemming and removing stop words. Each lexeme is associated with its positions in the original text, which is used for proximity searches.
Example with to_tsvector:
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
Notice: "the", "over" are stop words removed; "jumps" → "jump", "lazy" → "lazi".
What is a tsquery
A tsquery is the search expression normalized the same way. Use boolean operators ( & for AND, | for OR, ! for NOT) and the :* prefix for prefix matching.
SELECT to_tsquery('english', 'fox & (quick | lazy)');
-- Result: 'fox' & ( 'quick' | 'lazi' )
Use plainto_tsquery for natural language queries (turns "fox quick lazy" into 'fox' & 'quick' & 'lazi'), or phraseto_tsquery for exact phrase search.
Sponsored Protocol
Basic matching
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'fox & quick');
This works, but it's not performant: it computes the tsvector on the fly for every row. You need an index.
GIN index: the speed secret
PostgreSQL offers GiST and GIN indexes for full‑text. GIN is almost always the better choice: faster for search, slower for writes, but unbeatable for read workloads.
-- Add a generated tsvector column (auto‑maintained)
ALTER TABLE products ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''))) STORED;
-- Create the GIN index
CREATE INDEX idx_products_search ON products USING GIN (search_vector);
Now the search becomes:
SELECT id, name, description,
ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Note: If your data changes frequently, the GENERATED ALWAYS column is transparently updated. For bulk updates, consider a trigger or batch refresh.
Sponsored Protocol
Ranking: how to order by relevance
ts_rank and ts_rank_cd
ts_rank computes a score based on term frequency and proximity (how often terms appear and how close they are). ts_rank_cd uses document coverage (how much of the document is covered by the query terms).
To improve ranking quality, use weights on different columns. For instance, give more importance to the title than the description:
ALTER TABLE products ADD COLUMN search_vector_weighted tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(name,'')), 'A') ||
setweight(to_tsvector('english', coalesce(description,'')), 'B')
) STORED;
CREATE INDEX idx_products_weighted ON products USING GIN (search_vector_weighted);
SELECT id, name, description,
ts_rank(search_vector_weighted, query) AS rank
FROM products, to_tsquery('english', 'fox & quick') AS query
WHERE search_vector_weighted @@ query
ORDER BY rank DESC
LIMIT 20;
Weights range from A (highest) to D (lowest). setweight assigns a weight to all lexemes in that vector. ts_rank takes them into account during normalization.
Normalization options
ts_rank accepts a second integer parameter for the normalization method (document length, coverage, etc.). Default (0) accounts for length. We recommend 32 (normalize by document length divided by average) for homogeneous catalogs, or 4 (term count only) if you want absolute weight. Experiment.
Sponsored Protocol
SELECT ts_rank(search_vector_weighted, query, 32) AS rank ...
Advanced use: fuzzy search and synonyms
Prefix matching and wildcards
Use :* to match any word starting with that prefix:
SELECT to_tsquery('english', 'run:*');
-- matches 'run', 'running', 'runner', etc.
Synonyms with thesaurus
PostgreSQL supports thesaurus files for synonym mapping. Create a mapping file (e.g., english.ths) and load it. Useful for mapping "car" → "automobile" or "PC" → "computer". Configuration is complex but powerful for domain‑specific vocabularies.
Phonetic matching
If you need to handle typos, native full‑text won't suffice. Combine it with the pg_trgm extension (trigrams) for similarity search using a GiST index. That's a different topic.
Common mistakes and how to avoid them
- Not indexing the computed column: if you compute tsvector on the fly in the query, you lose all performance benefits. Always use a materialized column + GIN index.
- Forgetting the language configuration: always specify the correct language code (
'english','italian', etc.). If omitted, PostgreSQL defaults to'english'. - Custom stop words: if your domain uses words that the default dictionary treats as stop words (e.g., "type" in a fashion context?), create a custom dictionary.
- Ignoring ranking normalization: very long or very short documents can distort the order. Try different normalization values.
In summary — what to do now
- Add a tsvector column to your searchable table (products, articles, documents).
- Create a GIN index on that column.
- Use
to_tsqueryorplainto_tsqueryto transform user input into a tsquery. - Order by
ts_rankwith weights and appropriate normalization. - Test with real data: compare query times before and after using EXPLAIN ANALYZE.
For a deeper dive, check our pillar guide on Advanced PostgreSQL where we cover performance, JSONB, and replication. For a complete approach to Linux server administration, see our Linux guide for developers and sysadmins.
Sponsored Protocol
We use these techniques daily for our clients. They work. Try them yourself.