Instant Autocomplete — How We Improved our Search Queries

In an era of instant gratification, 2 seconds can feel like a lifetime. Technology’s made our lives much more convenient — but it’s also made us impatient. Very impatient.

We recently added global search to our token tracking app, Agents of I.N.U. While it worked fine, it often took 1–2 seconds to return results. It might not sound like long, but:

Painfully slow autocomplete

Not a great user experience. When it comes to autocomplete, any noticeable delay will lead to a poor user experience, particularly once it approaches 1 second.

Search Requirements

We’re searching for DeFi tokens in our database — almost 600 thousand of them. It’s not massive data in isolation, but search performance was noticeably slow.

We want to search by both name and contract address, depending on whether or not the search term was a valid hexadecimal string or not. The logic is as follows:

  • If the search term starts with 0x and is a valid hexadecimal string, assume the user is searching for a contract address. However, some tokens also have names beginning with 0x, so we want to find those also. A naive SQL implementation:
select * from tokens where contract_address like ‘0x123%’ or name ilike ‘0x123%’;
  • If the search term is not a valid contract address, then search by name:
select * from tokens where name ilike 'agent%';

Some Issues We Had to Overcome

Some Tokens Have Ridiculous Names (B-tree index was a no-go)

There’s no validation on the name field; we extract the name from the blockchain. While most token names are <100 characters, some bright sparks decided to give their tokens ridiculously long names. The longest? Almost 10,000 characters.

99.9% of tokens have a name length of 100 characters or less

While 99% of tokens have names with 50 characters or less, there are 200+ tokens with 100+ characters and 33 tokens with names of 1000+ characters. One such example is the aptly named COPYPASTA token:

https://bscscan.com/token/0xea064849f58f9C55Abf76Bf6a621CE437EE4a072

An index on the name column might seem like the obvious solution to improve query speed, but alas:

create index on token (name);------ERROR:  index row size 3688 exceeds btree version 4 maximum 2704 for index "name_ind" DETAIL:  Index row references tuple (9748,1) in relation "token". HINT:  Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. SQL state: 54000

We store our contract addresses as citext

Citext is a case-insensitive string type in Postgres. All operations are case-insensitive by default. Smart contract addresses are case-insensitive though stored as mixed-case on the blockchain (the mixed case is used as a checksum).

What’s more, we had an index on the contract address column:

create index on token (contract_address);

While analysing performance, we noticed that the index was never used, and the database was performing a sequential scan instead. We took it for granted that the database would use the index for a left-anchored `like` search, though this is not the case.

Indexes are not used when searching for both contract address and name

Even after resolving the above issue, the database refused to use our indexes.

OR clauses make it difficult for databases to use indexes. The database defaulted to a sequential scan when searching by contract address and name. In the worst case, this resulted in some search queries taking up to 5 seconds.

Our Solution(s)

Use Generated Columns for performance improvements

While not strictly necessary, we decided to use generated columns in Postgres. Generated columns are similar to computed and virtual columns in other databases. Postgres only supports stored, generated columns — each time a row is created or updated, the computed column is calculated and stored in the database.

We created two new columns for our search query:

alter table tokenADD COLUMN search_address varchar(42) generated always as (lower(contract_address::text)) stored,ADD COLUMN search_name varchar generated always AS (lower(substring(name, 1, 256))) stored;

The search_address column converts the citext contract_address to lowercase text.

search_name truncates the name column to 256 characters. We could have gone shorter again, but it’s unlikely to have made much of a difference.

Fix Index for Search Address

As mentioned previously, a default b-tree index does not help with left-anchored like queries. However, a simple change fixes this:

create index on token (search_address COLLATE "C");

POSIX/C locale support left-anchored queries, greatly improving performance. The following query:

select * from token where search_address like '0x234%' limit 10;

executes in < 1ms.

Create GIN Index for Search Name

Now that we’ve truncated the name column to well under the limit for a b-tree, we could create the same index as we used for the search address to support left-anchored indexes when searching by name.

However, we decided that full-text searching makes more sense for our use case. Rather than just having “Shiba” return tokens that start with “Shiba”, users would likely expect to see results like “Baby Shiba”, “Agent Shiba I.N.U”, etc. How else would they search for Elon Musk’s Latest Tweet Token?

B-tree indexes were out. Trigram to the rescue!

We won’t get into the details of what trigrams are. Check outpg_trgm Postgres extension for more information. To get access to this, you’ll first need to install the extension:

create extension pg_trgm;

This extension introduces two new index types, both of which support full-text searching: GIST and GIN. GIN indexes take longer to build than GIST but are faster to query. We went with GIN indexes as we want to optimise for querying speed.

create index on token using gin (search_name gin_trgm_ops);

Now, with the new index, the following query:

select * from token where search_name like '%agent shiba%' limit 10;

executes in well under 50ms. While slower than a left-anchored b-tree index, we were happy to pay the marginal performance hit for full-text searching.

Remove the OR clause when searching for both address and name

Let’s combine both our two queries and see what happens:

select * from token 
where search_address like '0x2345%'
or search_name like '%0x2345%'
limit 10;

Uh oh! We’re back to sequential scan and the occasional slow query (though most of the time, this query runs in 10-100ms, a considerable improvement over the original).

We decided to break the query into two separate queries. We could have used a UNION ALL to isolate the queries:

select * from token 
where search_address like '0x2345%'
UNION ALL
select * from token
where search_name like '%0x2345%'
limit 10;

However, as we’re using an ORM and not crafting the SQL queries, we decided to run them as two consecutive queries:

1. Retrieve up to 10 tokens where search_address matches the search term
2. If less than 10 tokens retrieved, retrieve (10 - number retrieved) tokens where search_name matches the search term

The Results

Much snappier search

Search now runs consistently in <20ms. More importantly, the frustratingly long tail of 5 second+ searches is gone, resulting in a much more enjoyable user experience.

Try it for yourself at https://app.agentsinu.com!

Interested in learning more about how Agents of I.N.U works? Read out latest product update for an overview of our backend architecture!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store