How correct Postgres plan in case of a double JOIN, ORDER BY and a LIMIT?

by Cédric R.   Last Updated October 18, 2018 11:06 AM

I got a SQL request that takes more than 4s:

SELECT "Content"."contentId"
    FROM "UserFeed" 
    JOIN "FeedRole" ON "FeedRole"."feedId" = "UserFeed"."feedId" 
    JOIN "Permission" ON "Permission"."roleId" = "FeedRole"."roleId"
    JOIN "TagContent" ON "TagContent"."tagId" = "Permission"."tagId"
    JOIN "Content" ON "Content"."contentId" = "TagContent"."contentId"
    WHERE "UserFeed"."userId" = 600 ORDER BY "Content".title LIMIT 30;

This is due to the ORDER BY, so I create an index:

CREATE INDEX ON "Content" ("title")

but postgres don't use it

If I split the request in two requests:

SELECT "Permission"."tagId"
    FROM "UserFeed" 
    JOIN "FeedRole" ON "FeedRole"."feedId" = "UserFeed"."feedId" 
    JOIN "Permission" ON "Permission"."roleId" = "FeedRole"."roleId"
    WHERE "UserFeed"."userId" = 600;

(take 81 ms)

and

SELECT "Content"."contentId"
    FROM "TagContent"
    JOIN "Content" ON "Content"."contentId" = "TagContent"."contentId"
    WHERE "TagContent"."tagId" IN (1584555, 2055170, 1868328, 1298378, 2021339, 1025073, 1254993, 1223241) ORDER BY "Content".title LIMIT 30;

(take 85 ms)

The second request use the index and everything is fine (less than 200ms is suffisant for our needs)

Question is: Is there a way to do that in one request ?

I test to do that:

SELECT "Content"."contentId"
    FROM "TagContent"
    JOIN "Content" ON "Content"."contentId" = "TagContent"."contentId"
    WHERE "TagContent"."tagId" IN (
        SELECT "Permission"."tagId"
            FROM "UserFeed" 
            JOIN "FeedRole" ON "FeedRole"."feedId" = "UserFeed"."feedId" 
            JOIN "Permission" ON "Permission"."roleId" = "FeedRole"."roleId"
            WHERE "UserFeed"."userId" = 600
    ) ORDER BY "Content".title LIMIT 30;

It takes more than 2s and it still don't want to use my index.

I also try to recompute statistics but there is no impact.

Here are execution plans:

Big request

"Limit  (cost=16672.63..16676.13 rows=30 width=19) (actual time=4446.668..4446.700 rows=30 loops=1)"
"  ->  Gather Merge  (cost=16672.63..16710.66 rows=326 width=19) (actual time=4446.659..4446.688 rows=30 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Sort  (cost=15672.60..15673.01 rows=163 width=19) (actual time=1542.963..1542.984 rows=11 loops=3)"
"              Sort Key: "Content".title"
"              Sort Method: quicksort  Memory: 25kB"
"              ->  Nested Loop  (cost=34.06..15666.62 rows=163 width=19) (actual time=79.554..830.419 rows=199691 loops=3)"
"                    ->  Nested Loop  (cost=33.63..15593.97 rows=163 width=4) (actual time=79.543..311.223 rows=199691 loops=3)"
"                          ->  Nested Loop  (cost=33.07..15576.16 rows=3 width=4) (actual time=31.299..79.487 rows=3 loops=3)"
"                                ->  Hash Join  (cost=32.64..15574.71 rows=3 width=4) (actual time=31.143..79.292 rows=3 loops=3)"
"                                      Hash Cond: ("FeedRole"."feedId" = "UserFeed"."feedId")"
"                                      ->  Parallel Seq Scan on "FeedRole"  (cost=0.00..9639.95 rows=467495 width=8) (actual time=0.023..37.842 rows=373996 loops=3)"
"                                      ->  Hash  (cost=32.55..32.55 rows=7 width=4) (actual time=0.088..0.088 rows=8 loops=3)"
"                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                            ->  Index Only Scan using "UserFeed_pkey" on "UserFeed"  (cost=0.43..32.55 rows=7 width=4) (actual time=0.050..0.080 rows=8 loops=3)"
"                                                  Index Cond: ("userId" = 600)"
"                                                  Heap Fetches: 8"
"                                ->  Index Only Scan using "Permission_pkey" on "Permission"  (cost=0.43..0.47 rows=1 width=8) (actual time=0.068..0.069 rows=1 loops=8)"
"                                      Index Cond: ("roleId" = "FeedRole"."roleId")"
"                                      Heap Fetches: 3"
"                          ->  Index Scan using "TagContent_tagId_idx" on "TagContent"  (cost=0.56..4.56 rows=137 width=8) (actual time=0.021..80.632 rows=74884 loops=8)"
"                                Index Cond: ("tagId" = "Permission"."tagId")"
"                    ->  Index Scan using "Content_pkey" on "Content"  (cost=0.43..0.45 rows=1 width=19) (actual time=0.002..0.002 rows=1 loops=599073)"
"                          Index Cond: ("contentId" = "TagContent"."contentId")"
"Planning time: 1.331 ms"
"Execution time: 4452.697 ms"

It appear that postgres underestimate the number of TagContent (expected 137, founded 74884)

Splitted first part:

"Gather  (cost=1033.07..16576.86 rows=7 width=4) (actual time=74.486..80.382 rows=8 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Nested Loop  (cost=33.07..15576.16 rows=3 width=4) (actual time=44.923..74.685 rows=3 loops=3)"
"        ->  Hash Join  (cost=32.64..15574.71 rows=3 width=4) (actual time=44.854..74.572 rows=3 loops=3)"
"              Hash Cond: ("FeedRole"."feedId" = "UserFeed"."feedId")"
"              ->  Parallel Seq Scan on "FeedRole"  (cost=0.00..9639.95 rows=467495 width=8) (actual time=0.026..36.309 rows=373996 loops=3)"
"              ->  Hash  (cost=32.55..32.55 rows=7 width=4) (actual time=0.065..0.065 rows=8 loops=3)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    ->  Index Only Scan using "UserFeed_pkey" on "UserFeed"  (cost=0.43..32.55 rows=7 width=4) (actual time=0.035..0.059 rows=8 loops=3)"
"                          Index Cond: ("userId" = 600)"
"                          Heap Fetches: 8"
"        ->  Index Only Scan using "Permission_pkey" on "Permission"  (cost=0.43..0.47 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=8)"
"              Index Cond: ("roleId" = "FeedRole"."roleId")"
"              Heap Fetches: 1"
"Planning time: 0.693 ms"
"Execution time: 81.619 ms"

Splitted second part:

"Limit  (cost=0.99..269.60 rows=30 width=19) (actual time=0.085..85.202 rows=30 loops=1)"
"  ->  Nested Loop  (cost=0.99..5584336.55 rows=623690 width=19) (actual time=0.084..85.182 rows=30 loops=1)"
"        ->  Index Scan using "Content_title_idx" on "Content"  (cost=0.43..66833.22 rows=1198146 width=19) (actual time=0.034..0.294 rows=250 loops=1)"
"        ->  Index Only Scan using "TagContent_pkey" on "TagContent"  (cost=0.56..4.60 rows=1 width=4) (actual time=0.338..0.338 rows=0 loops=250)"
"              Index Cond: ("contentId" = "Content"."contentId")"
"              Filter: ("tagId" = ANY ('{1584555,2055170,1868328,1298378,2021339,1025073,1254993,1223241}'::integer[]))"
"              Rows Removed by Filter: 87"
"              Heap Fetches: 21864"
"Planning time: 0.693 ms"
"Execution time: 85.263 ms"

It use "Content_title_idx"



Related Questions



Unused index in range of dates query

Updated December 27, 2017 17:06 PM


DB2 XML indexes and "ORDER BY" clause

Updated October 12, 2016 09:02 AM