hckrnws
PgAssistant: OSS tool to help devs understand and optimize PG performance
by justinclift
Neat! Just added it to my "Postgres Is Enough" gist:
https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
Thanks !
> OpenAI helper: If you have an OpenAI account, the interface can query OpenAI to help you understand your query plan and optimize your SQL queries
How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?
I guess it would need to cross the results of the explain analyze with at least the DDLs of the table (including partitions, indexes, triggers, etc), and like the sizes of the tables and the rate of reads/writes of the tables in some cases, to be able to "understand" and "reason" about what's going on and offer meaningful advices (e.g. proposing design changes for the tables).
I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?
In my experience, LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.
Typically they suggest 1) adding indexes and 2) refactoring the query. If you only provide the query then the model isn't aware of what indexes already exist. LLMs make assumptions about your data model that often don't hold (i.e. 3NF). Sometimes you have to give the LLM the latest docs because it's unaware of performance improvements and features for newer versions.
In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.
There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.
> LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.
LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.
If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.
The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).
My main issue with the use of LLMs for problems you’re unfamiliar with is precisely this. Trial and error is a fine way to work through a problem; you learn well from failure. But if you’re copy/pasting from something else the entire time, you’re not the one failing, and you’re not the one reasoning about why something didn’t work.
People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.
yes. my experience is that an LLM will sometimes suggest an interesting approach that breaks my pattern of thought enough to get me to a better solution. but the ones i primarily use (claude sonnet and chatgpt 4o) also seem to love suggesting CTEs which help readability but it depends on the RDBMS (and other things) as to whether or not they offer any real performance benefit.
that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience
> In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.
That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.
They’re non-deterministic and YMMV by design. No one can answer that question. It might save you a hundred million with some genius optimization it lucked into or, more likely, it’ll give you a seemingly equivalent query that actually returns subtly different results (easy to do with query optimizations when eg you rely on inherent/default sort order for joins with groups).
They can be useful. I just recently used Claude to make a query 10x faster, though I did have to back-and-forth a bit, so you do still need to know what you're doing.
Claude has been really good at all things Postgres related
I tested numerous LLM models locally with Ollama using PgAssistant. The one that gave me the best results is codestral:latest (15B).However, ChatGPT is truly a cut above the rest.
I've used them off and on for basic stuff like "tell me what this query does". They're usually pretty good at that. Sometimes it will make suggestions that are meaningful improvements, most of the time not. Unless you specifically ask about something like "would this be better done with a left join lateral/correlated subquery?" But you kinda have to already know what you're going for and then it can be helpful.
LLM:s are really really good at creating queries. Shaving of the last ms from a complex query? yea, im sure an experienced query optimizer expert might beat it.
But you get very far from letting the LLM run a few queries to gather info about the database and its use.
They won’t understand your schema and make the wrong assumptions _but_ as long as you are checking the results and can understand the final query they can be very helpful.
Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.
When pgAssistant sends the prompt to the LLM, it provides the DDL of the tables involved in the query (using pg_dump), the result of the ANALYZE query, and of course, the query itself.
Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.
That being said, they are still LLMs and inherently non-deterministic.
But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.
i've had great luck with it. i can't say i'm any sort of database expert, but it's a lot better at this than me.
i was especially impressed that it could not only suggest changes to the queries or indexes to add, but it suggested a query be split into multiple queries at the application level which worked incredibly well.
The screenshot section in the README seems to be empty. Would've been interesting to see that. There's many tools that do similar things like https://github.com/ankane/pghero or some tools here: https://gitlab.com/postgres-ai
I think it's probably just a large file, as I remember thinking the same thing but it was loaded when I returned to the browser tab with the readme later on.
Ah, you are right. I'm on a slow connection right now and it's a 22MB gif.
(https://github.com/nexsol-technologies/pgassistant/tree/main...)
> and it's a 22MB gif
Whoa... that's a lot of data for a README! But demos are pretty important, so I guess it's worth it.
A mp4 would be a much better choice instead of a gif and probably ~1.5MB.
ffmpeg -y -i pgassistant.gif -c:v libx265 -q 55 -tag:v hvc1 -movflags '+faststart' -pix_fmt yuv420p pgassistant.mp4
That takes the GIF down to 1338973 bytes (1.3M) with (to my eyes) little loss of readability. __264516 Feb 12 11:44 pgassistant.gif
22782965 Feb 12 11:46 pgassistant.gif.raw.gif
_2120322 Feb 12 11:55 pgassistant.gif.av1-20.mp4
__245780 Feb 12 11:56 pgassistant.gif.av1-55.mp4
wget -O pgassistant.gif.raw.gif 'https://github.com/nexsol-technologies/pgassistant/blob/main...'ffmpeg -h encoder=libaom-av1
ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 20 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-20.mp4
ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 55 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-55.mp4
A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.
Is there any benefit in using libaom-av1 over libsvtav1 nowadays?
Licensing, patents (as far as I've heard anyway).
NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc
libaom-av1 and libsvtav1 are both AV1, the former being the reference implementation and the latter simply far faster
Submit a PR?
I'm trying but it turns out that it's impossible to submit a useful PR with a video embed in `README.md` because the only way it works is by uploading the video through the web editor[0] and using a static user-images.githubusercontent.com URL.
[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.
Maybe just open an issue and provide the ffmpeg command then
Yeah, did that in the end.
Good point.
honest question: does anybody know a good book/tutorial/source/whatever on becoming a postgresql DBA ?
most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.
2nd Quadrant (bought by Enterprise DB several years ago) had a bunch of excellent YouTube tutorials on common tasks. Some of them may still be available on YouTube.
EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.
Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.
Greg Smith wrote a book about postgres high performance that does go into the internals a bit and how to analyze performance problems. If you want to be a DBA, this will probably wind up needing to be in your back pocket at some point.
The PostgreSQL docs are great and then watch the presentations from their conferences - very cool stuff.
Then practice, docker makes this easy.
Most of what I've learned has come from PG docs and various blogs on "we found this crazy thing costing us 100x memory/processing/response time/etc"
But there is a great book on the internals (very long, in depth, but not unreadable if you have a good foundation already) - https://edu.postgrespro.com/postgresql_internals-14_en.pdf
They probably have newer versions, this is just what's in my bookmarks
What sorts of skills are you looking to build? Since this post has a lot of comments about query optimization, I’d suggest query optimization is more in the realm of developers and not DBAs. Schema, query, and index design. DBAs might be more focused on replication/HA, security, fleet management, backups, DR.
Happy to help with more targeted recommendations!
> What sorts of skills are you looking to build?
I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.
I don't like tool-specific tutorials or storytelling-based articles.
I'd like to learn the core postgresql things (terminology, tasks, operations) so that then i can evaluate autonomously what tool to pick and why, rather than following the trend of the month. And of course, being able to actually do "the things".
Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).
Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).
This is a bit older now, but seems like it covers the timeless basics:
this looks like the kind of material i'm looking for.
I'll take a closer look to the material, which is already good.
It's sad that there does not seem to be anything past "DBA1" (that is, no DBA2 or DBA3).
All of their materials seem to be available on their file share website here:
There's some stuff for internals (mentioned in a sibling comment), as well as what seems like stuff for PG15 though I haven't looked at it.
Added this to the performance tuning section: https://github.com/Olshansk/postgres_for_everything?tab=read...
I'm surprised that both this list and the linked 'very comprehensive list' are missing pgbadger.
This reminds me of a less feature-rich version of PgAnalyze, but it's always nice to have alternative OSS options for this sort of optimization work.
I’ve wanted to make this for years. I have a doc of saved pg_stat_statement and other queries saved.
Your screenshot for issues found show indices being created with CREATE INDEX. If someone runs that on a prod DB with a large table, they’re gonna have a bad time. You should have it suggest CREATE INDEX CONCURRENTLY.
Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.
Always lots to learn, appreciate the insight.
It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.
I’ve long held that Postgres is more powerful than MySQL in terms of flexibility and options, but you really ought to have thoroughly read and understood the manual to run it in prod. In comparison, MySQL does a fantastic job out of the box (modulo buffer pool sizing and sync_array_size, and tbf Postgres is equally bad at the former, and the latter doesn’t matter until larger scale), but is somewhat limited in what you can do with it (emphasis on somewhat – it’s still quite adequate for nearly anyone).
That’s why I’m forever salty that all of the tech influencers and bloggers hyped up Postgres so much. Yes, it’s a great RDBMS, but the frontend devs eagerly jumping into it haven’t the slightest clue what they’re getting themselves into.
Thank you for this advice but the tool is designed to help dev ... before to run on production.
And you don’t think someone would copy/paste what they did on dev or stage onto prod?
well my dev teams are using Flyway. So, I may not be reasoning correctly. You’re right, a copy-paste can happen quickly. I will modify the code according to your remark.
[dead]
[flagged]
In what way?
Troll account is trolling.
Comment was deleted :(
Crafted by Rajat
Source Code