Optimizing Top K in Postgres
- bob1029 - 15011 sekunder sedanLucene really does feel like magic sometimes. It was designed expressly to solve the top K problem at hyper scale. It's incredibly mature technology. You can go from zero to a billion documents without thinking too much about anything other than the amount of mass storage you have available.
Every time I've used Lucene I have combined it with a SQL provider. It's not necessarily about one or the other. The FTS facilities within the various SQL providers are convenient, but not as capable by comparison. I don't think mixing these into the same thing makes sense. They are two very different animals that are better joined by way of the document ids.
- jmgimeno - 17955 sekunder sedanMaybe I'm wrong, but for this query:
SELECT * FROM benchmark_logs WHERE severity < 3 ORDER BY timestamp DESC LIMIT 10;
this index
CREATE INDEX ON benchmark_logs (severity, timestamp);
cannot be used as proposed: "Postgres can jump directly to the portion of the tree matching severity < 3 and then walk the timestamps in descending order to get the top K rows."
Postgres with this index can walk to a part of the tree with severity < 3, but timestamps are sorted only for the same severity.
- davidelettieri - 20448 sekunder sedanThe "But Wait, We Need Filters Too" paragraph mentions "US" filter which is introduced only later on.
- h1fra - 9261 sekunder sedanPostgres is really good at a lot of things, but it's very unfortunate that it's really bad at simple analytics. I wish there was a plugin instead of having to have N databases
- Vadim_samokhin - 13481 sekunder sedanJust in case, there is a btree_gin extension which can be used in queries combining gin-indexable column and btree-indexable column. It doesn’t solve top-K ordering problem though.
- JEONSEWON - 23061 sekunder sedan[flagged]
- bbshfishe - 19521 sekunder sedan[dead]
- tacone - 17475 sekunder sedanThe issue here is the row based format. You simply can't filter on arbitrary columns with that. Either use an external warehouse or a columnar plug-in like Timescale.
Nördnytt! 🤓