Optimizing SQL query with two table joins and over 40 million rows in partitioned table

by Maksim Luzik   Last Updated October 17, 2018 18:06 PM

I have three tables, lets say their names are:

My query is as follows:

SELECT table1.timestamp::DATE, table2.segment, COUNT(table1.timestamp::DATE) as count, ABS(SUM(table1.amount)) as sum, table3.category, table3.text

FROM table1 
LEFT JOIN table3 ON table1.id = table3.id 
LEFT JOIN table2 ON table1.hash = table2.hash

WHERE table1.timestamp::DATE >= '2015-01-01' AND table1.time::DATE < '2015-02-01' AND table2.segment IN (1,2)

GROUP BY (table1.timestamp::DATE, table2.segment, table3.category, table3.text) 

ORDER BY table1.timestamp::DATE, sum DESC;

How it would be best to optimize the query? I have indexes on the following columns:

  • table1.hash
  • table1.timestamp
  • table1.id
  • table2.hash
  • table2.segment
  • table3.id

The query takes over 2 minutes to complete currently. I had a table1 with 5 million rows before and this query was running under 5 seconds. But when it grew to 40 million it got slow. So I partitioned it and would have assumed with the indexes and table1 partitioned into 5 million rows partition tables the performance would increase, but nothing. It is like there is no difference if the table is partitioned or not.

To my knowledge WHERE clause should be executing first in limiting the results so that the JOIN operations should only join already limited amount of rows. But this does not seem to happen?

Related Questions

How to partition existing table in postgres?

Updated February 21, 2019 05:06 AM

TSQL speedup partition by query

Updated October 16, 2017 10:06 AM