Score:0

Data Sync between 2 Datacenters

ga flag

I'm reaching out to you in the hope that some of you can provide me with fresh ideas for a current problem I'm working on. I'm planning to set up a redundant data center (including websites, mail services and databases). Here are the key points:

I have two data centers connected by a 1 Gbps link, with IPs routed using BGP. One data center acts as active, while the other serves as passive. The basic prerequisites in this regard are already in place.

However, I'm facing the challenge of keeping the data consistent and up-to-date on both sides. Currently, I'm using @Virtuozzo Hybrid Server with Virtuozzo Storage (SDS), which features a three-fold redundant network storage system. The PLOOPS (Container Files) of various sizes ranging from a few GBs to several TBs are stored on these systems.

Currently, I'm creating backups of the PLOOPs by taking snapshots and then transferring them to a geographically separated storage using BackupPC. This process occurs once a day.

In the event of a disaster, it would be desirable to have access to more up-to-date data than what's limited to a 24-hour window. Unfortunately, due to the 1 Gbps link between the data centers, a form of live replication is not feasible.

Are there any approaches that might work better and faster than synchronizing the sometimes several TB-sized files using rsync between the data centers? (copy only change delta? But since it's Virtuzzo File System there is no such mechanism I can utilize as far as I know)

Perhaps some of you have interesting ideas in this regard? I would greatly appreciate your insights.

Thank you very much, Andreas

captainmish avatar
cn flag
Have you tried something like glusterfs or ceph? Gluster has a geo-replication mode which could work for you
Futureweb GmbH avatar
ga flag
@captainmish - Virtuozzo got it's own Software Defined Storage which it's using for Virtual Machines / Containers. Which unfortunately does not offer such a Geo-Replication Mode :-( But I will have a look into Gluster - maybe a solution like this could somehow be implemented ... thx
Score:2
ws flag

Are there any approaches that might work better and faster than synchronizing the sometimes several TB-sized files using rsync between the data centers? (copy only change delta? But since it's Virtuzzo File System there is no such mechanism I can utilize as far as I know)

erm, copying the delta is exactly what rsync does. However there is an overhead in identifying the data to be copied.

You would be better talking to Virtuozzo about how to best replicate data than us.

Software-only storage devices usually don't provide an ability to track changes at the block device level. ZFS, Simplivity and Proxmox (latter also in PBS) both can track changes and replicate only changes without comparing source and dest.

Futureweb GmbH avatar
ga flag
@symcean - of course we've already consulted Virtuozzo for advise ... unfortunately they were not able to come up with a viable Solution to this Problem ... that's why I'm asking here in the hope some knowledgeable people can come up with a good Idea on how to approach this problem ... rsync itself would work - correct - but unfortunately it's getting too slow when dealing with 3-4 TB sized Files with lot's of changes all over the File ... I would need something similiar to rsync but which work's better with very large files ... ?!
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.