Postgres DB is extremely fast with a query with a certain parameter, but extremely slow for another

by Shoken   Last Updated July 12, 2019 14:06 PM

It's me again. Sorry, as you can guess, I'm very new at this, but I'm going to try and give you all the info at once this time.

I have a table called stats, where I store every action on a certain website (basically when the user clicks a link, data is saved in the db). The data is modelled with postgres as

id integer NOT NULL DEFAULT nextval('stats_id_seq'::regclass),
company integer NOT NULL,
layout integer NOT NULL,
browsername character varying(50) COLLATE pg_catalog."default",
lang character varying(5) COLLATE pg_catalog."default" NOT NULL,

there are other fields that I don't think are relevant. Basically what I want to do is see which are the most used browsers for a certain company, by counting how many pages are visited by that browser and how many users there are for a given browser.

My query is:

select browsername, count(DISTINCT phpsessid) as visitors, count(*) as pages
from public.stats
where company=1 and createdate>'2017-01-01' and createdate<'2018-12-12' 
group by browsername order by visitors desc;

the stats table contains around 80 millions rows. Now, this query is extremely fast if I set company=1 or company=222 (it's the company ID). It takes less than half a second to fetch around 20k rows. However, it is EXTREMELY slow if I set company=13549, for instance (we're talking about literal hours here). Obviously something is wrong, either in the data modelling or in the way I query.

How come there's such a difference for different companies? The DB was not done by me so I apologize if I left something useful out, and feel free to ask.

The indexes are:

CREATE INDEX stats_new_company_13549_index
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default    WHERE company = 13549

CREATE INDEX stats_new_company_14863_index
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default    WHERE company = 14863

CREATE INDEX stats_new_company_createdate_cet_index
ON public.stats USING btree
(company, date(timezone('CET'::text, createdate)))
TABLESPACE pg_default;

CREATE INDEX stats_new_company_createdate_index_1
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default;

Here's the plan for the fast query:

And here's the plan for the slow one:

Just by looking at the plans you can see it took several minutes for the second one to even query the explain.

I also noticed that after running the explain a couple of times, it got done in reasonable amounts the third time. From that moment onwards, the query would also drastically reduce its execution time, from hours to a mere 4 seconds. It's the second time it happens, and I swear I'm not crazy. If I change the company ID once again, the query takes hours again. I'm at lost here: is there some index problem?

Tags : postgresql query

Answers 1

The fast query retrieves and sorts 26229 rows - for that small number of rows, the sorting can be done in memory, so obviously this is going to be quick. First because retrieving the data only takes ~500ms and then the sorting is done in 50ms.

The slow query retrieves 560135 rows (20 times as many as the first query) but the time it wook - 149939ms - seems quite slow. Maybe your table (or index) is bloated - the number of blocks needed to read that number of rows is way too high I think.

You can run vacuum full analyze public.stats; and see if performance gets better after that.

Or maybe you simply have a very slow harddisk.

The sorting was done on disk, but that only added another 3 seconds to the total runtime.

The question why the same query is sometimes fast and sometimes quick (especially when run the second time) is more often that not answered with: caching effects. When you re-run the second query you will probably see a lot of the "shared read=318147" information turn to "shared hit=..." which means those blocks were already in the cache.

July 12, 2019 13:13 PM

Related Questions

find out postgresql query progress

Updated October 01, 2018 04:06 AM

Slow PostgreSQL query - how to improve it?

Updated January 21, 2019 16:06 PM

Is there a way to improve this query further?

Updated March 26, 2015 12:02 PM