PgAssistant: OSS tool to help devs understand and optimize PG performance

PgAssistant: OSS tool to help devs understand and optimize PG performance

(github.com)

253

29d

by justinclift

cpursley

29d

Neat! Just added it to my "Postgres Is Enough" gist:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

bertrandhartwig

28d

Thanks !

ggregoire

29d

> OpenAI helper: If you have an OpenAI account, the interface can query OpenAI to help you understand your query plan and optimize your SQL queries

How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?

I guess it would need to cross the results of the explain analyze with at least the DDLs of the table (including partitions, indexes, triggers, etc), and like the sizes of the tables and the rate of reads/writes of the tables in some cases, to be able to "understand" and "reason" about what's going on and offer meaningful advices (e.g. proposing design changes for the tables).

I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?

ellisv

29d

In my experience, LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

Typically they suggest 1) adding indexes and 2) refactoring the query. If you only provide the query then the model isn't aware of what indexes already exist. LLMs make assumptions about your data model that often don't hold (i.e. 3NF). Sometimes you have to give the LLM the latest docs because it's unaware of performance improvements and features for newer versions.

In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.

gopalv

28d

> LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.

If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.

The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).

sgarland

28d

My main issue with the use of LLMs for problems you’re unfamiliar with is precisely this. Trial and error is a fine way to work through a problem; you learn well from failure. But if you’re copy/pasting from something else the entire time, you’re not the one failing, and you’re not the one reasoning about why something didn’t work.

People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.

swasheck

28d

yes. my experience is that an LLM will sometimes suggest an interesting approach that breaks my pattern of thought enough to get me to a better solution. but the ones i primarily use (claude sonnet and chatgpt 4o) also seem to love suggesting CTEs which help readability but it depends on the RDBMS (and other things) as to whether or not they offer any real performance benefit.

that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience

sgarland

28d

> In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.

ComputerGuru

29d

They’re non-deterministic and YMMV by design. No one can answer that question. It might save you a hundred million with some genius optimization it lucked into or, more likely, it’ll give you a seemingly equivalent query that actually returns subtly different results (easy to do with query optimizations when eg you rely on inherent/default sort order for joins with groups).

akx

29d

They can be useful. I just recently used Claude to make a query 10x faster, though I did have to back-and-forth a bit, so you do still need to know what you're doing.

parthdesai

29d

Claude has been really good at all things Postgres related

bertrandhartwig

29d

I tested numerous LLM models locally with Ollama using PgAssistant. The one that gave me the best results is codestral:latest (15B).However, ChatGPT is truly a cut above the rest.

VWWHFSfQ

29d

I've used them off and on for basic stuff like "tell me what this query does". They're usually pretty good at that. Sometimes it will make suggestions that are meaningful improvements, most of the time not. Unless you specifically ask about something like "would this be better done with a left join lateral/correlated subquery?" But you kinda have to already know what you're going for and then it can be helpful.

victorbjorklund

29d

LLM:s are really really good at creating queries. Shaving of the last ms from a complex query? yea, im sure an experienced query optimizer expert might beat it.

But you get very far from letting the LLM run a few queries to gather info about the database and its use.

joshstrange

29d

They won’t understand your schema and make the wrong assumptions _but_ as long as you are checking the results and can understand the final query they can be very helpful.

Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.

bertrandhartwig

28d

When pgAssistant sends the prompt to the LLM, it provides the DDL of the tables involved in the query (using pg_dump), the result of the ANALYZE query, and of course, the query itself.

Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.

That being said, they are still LLMs and inherently non-deterministic.

But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.

notatoad

28d

i've had great luck with it. i can't say i'm any sort of database expert, but it's a lot better at this than me.

i was especially impressed that it could not only suggest changes to the queries or indexes to add, but it suggested a query be split into multiple queries at the application level which worked incredibly well.

dewey

29d

The screenshot section in the README seems to be empty. Would've been interesting to see that. There's many tools that do similar things like https://github.com/ankane/pghero or some tools here: https://gitlab.com/postgres-ai

justinclift

29d

I think it's probably just a large file, as I remember thinking the same thing but it was loaded when I returned to the browser tab with the readme later on.

dewey

29d

Ah, you are right. I'm on a slow connection right now and it's a 22MB gif.

(https://github.com/nexsol-technologies/pgassistant/tree/main...)

ai-christianson

29d

> and it's a 22MB gif

Whoa... that's a lot of data for a README! But demos are pretty important, so I guess it's worth it.

dewey

29d

A mp4 would be a much better choice instead of a gif and probably ~1.5MB.

zimpenfish

29d

    ffmpeg -y -i pgassistant.gif -c:v libx265 -q 55 -tag:v hvc1 -movflags '+faststart' -pix_fmt yuv420p pgassistant.mp4

That takes the GIF down to 1338973 bytes (1.3M) with (to my eyes) little loss of readability.

mjevans

29d

  __264516 Feb 12 11:44  pgassistant.gif
  22782965 Feb 12 11:46  pgassistant.gif.raw.gif
  _2120322 Feb 12 11:55  pgassistant.gif.av1-20.mp4
  __245780 Feb 12 11:56  pgassistant.gif.av1-55.mp4

wget -O pgassistant.gif.raw.gif 'https://github.com/nexsol-technologies/pgassistant/blob/main...'

ffmpeg -h encoder=libaom-av1

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 20 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-20.mp4

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 55 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-55.mp4

A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.

kirici

29d

Is there any benefit in using libaom-av1 over libsvtav1 nowadays?

mjevans

29d

Licensing, patents (as far as I've heard anyway).

NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc

kirici

23d

libaom-av1 and libsvtav1 are both AV1, the former being the reference implementation and the latter simply far faster

ai-christianson

29d

Submit a PR?

zimpenfish

29d

I'm trying but it turns out that it's impossible to submit a useful PR with a video embed in `README.md` because the only way it works is by uploading the video through the web editor[0] and using a static user-images.githubusercontent.com URL.

[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.

ai-christianson

29d

Maybe just open an issue and provide the ffmpeg command then

zimpenfish

29d

Yeah, did that in the end.

ai-christianson

29d

Good point.

znpy

29d

honest question: does anybody know a good book/tutorial/source/whatever on becoming a postgresql DBA ?

most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.

oa335

29d

2nd Quadrant (bought by Enterprise DB several years ago) had a bunch of excellent YouTube tutorials on common tasks. Some of them may still be available on YouTube.

EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.

Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.

https://www.postgresql.org/docs/current/admin.html

jpitz

29d

Greg Smith wrote a book about postgres high performance that does go into the internals a bit and how to analyze performance problems. If you want to be a DBA, this will probably wind up needing to be in your back pocket at some point.

edoceo

29d

The PostgreSQL docs are great and then watch the presentations from their conferences - very cool stuff.

Then practice, docker makes this easy.

lacoolj

29d

Most of what I've learned has come from PG docs and various blogs on "we found this crazy thing costing us 100x memory/processing/response time/etc"

But there is a great book on the internals (very long, in depth, but not unreadable if you have a good foundation already) - https://edu.postgrespro.com/postgresql_internals-14_en.pdf

They probably have newer versions, this is just what's in my bookmarks

andatki

29d

What sorts of skills are you looking to build? Since this post has a lot of comments about query optimization, I’d suggest query optimization is more in the realm of developers and not DBAs. Schema, query, and index design. DBAs might be more focused on replication/HA, security, fleet management, backups, DR.

Happy to help with more targeted recommendations!

znpy

28d

> What sorts of skills are you looking to build?

I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.

I don't like tool-specific tutorials or storytelling-based articles.

I'd like to learn the core postgresql things (terminology, tasks, operations) so that then i can evaluate autonomously what tool to pick and why, rather than following the trend of the month. And of course, being able to actually do "the things".

Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).

Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).

justinclift

28d

This is a bit older now, but seems like it covers the timeless basics:

https://postgrespro.com/community/courses/DBA1

znpy

28d

this looks like the kind of material i'm looking for.

I'll take a closer look to the material, which is already good.

It's sad that there does not seem to be anything past "DBA1" (that is, no DBA2 or DBA3).

justinclift

28d

All of their materials seem to be available on their file share website here:

https://edu.postgrespro.com

There's some stuff for internals (mentioned in a sibling comment), as well as what seems like stuff for PG15 though I haven't looked at it.

Olshansky

29d

Added this to the performance tuning section: https://github.com/Olshansk/postgres_for_everything?tab=read...

OJFord

28d

I'm surprised that both this list and the linked 'very comprehensive list' are missing pgbadger.

booleanbetrayal

29d

This reminds me of a less feature-rich version of PgAnalyze, but it's always nice to have alternative OSS options for this sort of optimization work.

scosman

29d

I’ve wanted to make this for years. I have a doc of saved pg_stat_statement and other queries saved.

sgarland

28d

Your screenshot for issues found show indices being created with CREATE INDEX. If someone runs that on a prod DB with a large table, they’re gonna have a bad time. You should have it suggest CREATE INDEX CONCURRENTLY.

Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.

j45

28d

Always lots to learn, appreciate the insight.

It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.

sgarland

28d

I’ve long held that Postgres is more powerful than MySQL in terms of flexibility and options, but you really ought to have thoroughly read and understood the manual to run it in prod. In comparison, MySQL does a fantastic job out of the box (modulo buffer pool sizing and sync_array_size, and tbf Postgres is equally bad at the former, and the latter doesn’t matter until larger scale), but is somewhat limited in what you can do with it (emphasis on somewhat – it’s still quite adequate for nearly anyone).

That’s why I’m forever salty that all of the tech influencers and bloggers hyped up Postgres so much. Yes, it’s a great RDBMS, but the frontend devs eagerly jumping into it haven’t the slightest clue what they’re getting themselves into.

bertrandhartwig

28d

Thank you for this advice but the tool is designed to help dev ... before to run on production.

sgarland

28d

And you don’t think someone would copy/paste what they did on dev or stage onto prod?

bertrandhartwig

28d

well my dev teams are using Flyway. So, I may not be reasoning correctly. You’re right, a copy-paste can happen quickly. I will modify the code according to your remark.

cpachmann

29d

[dead]

libsofhn

28d

[flagged]

justinclift

28d

In what way?

cwillu

28d

Troll account is trolling.

Comment was deleted :(

Crafted by Rajat

Source Code

hckrnws

PgAssistant: OSS tool to help devs understand and optimize PG performance