Score:0

Postgresql 13 - Speed up pg_dump to 5 minutes instead of 70 minutes

cn flag

We use pg_dump nightly to make a snapshot of our database. We did for a long time with a simple command

pg_dump -Fc database_name

This takes about an hour and produces a file of 30+GByte.

How can we speed up things?

Score:4
cn flag

Our biggest table contains raw image data in form of a bytea column.

When we did a simple COPY from psql on this table to stdout, it was quite fast (1 or 2 minutes), but it was very slow with the pg_dump. It took ~60 minutes.

So investigating this I stumbled across this discussion. It seams that the pg_dump compression is rather slow if data is already compressed as it is with image data in a bytea format. And it is better to compress outside of pg_dump (-Z0).

Additionally we found that we can make use of our multi-core cpu (-j10 and pigz). So now we are doing it like this:

$ pg_dump -Z0 -j 10 -Fd database_name -f dumpdir
$ tar -cf - dumpdir | pigz > dumpdir.tar.gz
$ rm dumpdir

The time has dropped from ~70 minutes to ~5 minutes. Quite amazing.

You can restore it like this:

$ pigz -dc dumpdir.tar.gz | tar -C dumpdir --strip-components 1 -xf -
$ pg_restore -j 10 -Fd -O -d database_name dumpdir
Melih avatar
gh flag
We had similar problem. 2 days spent for pg_dump and i canceled however i need to take dump. we have 3 TB database and more than 1,5 TB is bytea image table. The time could dropped in big tables?
Janning avatar
cn flag
Yes, I guess it will help a lot (= hours)
plancys avatar
sd flag
How do you restoring db from this? `pg_restore -d root -j 10 -Fd dump_2022-03-24_17_47_40.tar.gz` returns me `pg_restore: error: could not open input file "dump_2022-03-24_17_47_40.tar.gz/toc.dat": Not a directory`
Melih avatar
gh flag
@Janning 2 days spent. now with -j 2 6.5 hours. i hope there will be less time with -j 10... thanks
Janning avatar
cn flag
The problem is not about -j but important is -Z0 to disable compression and use -Fd to dump it into a directory. -j is just telling the process how many CPUs it should use. If you have plenty use them all.
Gabriel Linassi avatar
at flag
@Janning the pg_dump tip helps a lot however I'm doing a dump from a remote server and hence the compression is not working because will do it locally after pg_dump is done. Wanted to know how can I execute the pigz compression in the remote server to then download only the compressed file?
Jose Paez avatar
tl flag
I tried this approach and it **didn't work** for me. The backed up DB was a mess so the speed saved wasn't really worth it. I wish I could remove my upvote, I lost a lot of time on this approach
Janning avatar
cn flag
It is working for me for years without any problems. I guess your backup script is wrong. Just post a new question on stack overflow with your script and link it here.
Markus avatar
nr flag
@JosePaez Maybe the problem was the -j option. I read in the docs: _pg_dump -j uses multiple database connections; [..]. Without the synchronized snapshot feature, the different worker jobs wouldn't be guaranteed to see the same data in each connection, which could lead to an **inconsistent backup**._
Alexandr Tovmach avatar
md flag
Works like a charm, tested on ~30GB database
pkExec avatar
bt flag
Great answer, thanks. For completeness sake, I'd add the "-p10" parameter to pigz, where 10 is the number of cores you want to limit it too. By default, it uses all cores, which may or may not be what you want.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.