Score:-1

Regarding how Big companies set up their databases

in flag

I am trying to understand how do companies which handle a large database requests set up their infrastructure. I have recently created a few python applications which store data in PostGres and requests to read/write take time; since they are not stored on RAM and they are massive bottleneck to speed. Which would still work if the numbers of users are less than 10 and even with 100 with queuing it could work but with thousands of requests, how is data stored, since it would slow down during the reading and writing.

In a recent website project which I am working on, I had the idea to read the entire database and store it in pandas and periodically write its altered entries to the database. This approach seems dangerous in situations where if OS would happen to crash and has to be restarted, this would cause loss of data.

Is this the approach large companies? Where they read the entire database to RAM? If not could you please advise what possible ways I could handle large data for say a blog website, where the read/write time could be reduced.

Even if you can point me to pages where I can educate myself more about it, would be enough. Thanks

us flag
Rob
*"I had the idea to read the entire database"* - That is about as efficient as ordering all books from Amazon, waiting for them to be delivered and then, after your book stacks have been ordered by the colour of the cover, discarding all books that are not about database design for dummies and designing data intensive applications. - Database schema and query design are much more important for performance with large data sets, although sufficiently oversized hardware and server side tuning will often (for a time) compensate quite well for bad and mediocre programming.
us flag
Rob
There's quite a few real-life architectures discussed at http://highscalability.com/start-here/ - not all that will be applicable to you and you won't find code there, but the concepts and solutions and designs are often quite educational.
us flag
100 users is almost nothing. We handle about 200 million (rather complex) transactions daily for a couple of million users. But every single query is optimised for performance.
br flag
"Is this the approach large companies? Where they read the entire database to RAM?" - Yes, kind of, generally speaking if the dataset needs to be entirely in memory then you'd probably refactor the DB to work on a NoSQL system.
Score:3
al flag
fvu

Even relatively small servers are capable of executing hundreds of queries per second.

Therefore I think that a first, major bottleneck is caused by improper tuning of your Postgres instance. There are a ton of tunable parameters, hop over to PGtune to get a good starting tune.

Then, have a look at your queries. Do you have indexes that cover most queries? Run your queries with explain (eg in pgAdmin) and see how efficient they are

Then, look whether your queries return appropriate dataset sizes. No need to return an entire table if you just want 1 record. In that case, optimize your queries.

You see, there's a lot you can do before having to turn to more exotic, complicated and often more brittle techniques.

Regarding your comment: specifically for postgres the wiki has some interesting information. Anyways, a healthy dose of reading the fine manual is crucial as well. Database engines are fine pieces of engineering, and you should understand a fair bit of what's going on under the hood to really manage them well. And there's some bad news as well: tuning your database is highly dependent on the size of the database, how and how efficient it is accessed, in real world scenarios you will constantly keep an eye on how it performs and what actions could improve the situation. We use pgBadger for many of the analysis tasks.

Slartibartfast avatar
in flag
Could you please recommend me some online resource where I can lean more about optimization of database?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.