Skip to main content

Remote incremental backups - timeouts

Comments

51 comments

  • Infopro
    This video posted to cPanelTV on YouTube may be of some use in understanding how the backup system works: [MEDIA=youtube]qZKlYuBOr40[/MEDIA]
    0
  • sp3ctre69
    Not quite... my issue is the errors are in the backup transport area. Presumably the backups are correct on the server but incorrect on the remote server. My question is if yesterdays backup is incorrect on the remote will this get corrected on the next transport i.e. Will it rsync all the days or just the current one. Seems that a backup transport error on one day could potentally corrupt your backup integrity unless corrected.
    0
  • Infopro
    Have you adjusted the time out settings? Maximum destination timeout Enter the number of seconds the backup will attempt to run. If the backup attempt is not successful in this time, it will timeout and stop.
    0
  • sp3ctre69
    Have you adjusted the time out settings?

    Yes I did, they are quite high... I have 3 servers on the new system, 1 fails every time, but works when I re-run it (like I said though, when logging in to check the initial failure it does appear to have done the work). The other 2 servers work every time, apart from last night when all 3 failed (due to either the version update or a local network issue). Either way, the timeout is not currently my concern, it is how a potential transport issue could affect the remote copy of the incremental backups.
    0
  • Infopro
    Do you think this may be related? Backup to Amazon S3 Doesn't Obey Retention Rules
    0
  • sp3ctre69
    Just to clarify my question a little... consider the following few days... (backup retention set to 3) Mon - Backup works, stored on server, remote transport works (backup files exist remotely) Tue - Backup works, stored on server, remote transport works (backup files exist remotely) Wed - Backup works, stored on server, remote transport FAILS (backup files DO NOT exist remotely) My question is what happens on Thursday.... Tue - Backup works, stored on server, remote transport works (backup files exist remotely) Wed - Backup works, stored on server, remote transport FAILS (backup files DO NOT exist remotely) ** here is the problem Thu - Backup works, stored on server, remote transport works (backup files exist remotely) If these files do not get repopulated and only the Thursday files get remotely copied there is a hole in my remote backup still. Looks to me like backup integrity is still ok on the server, but remotely there is still a hole unless it gets fixed in the next transport session.
    0
  • cPanelMichael
    If these files do not get repopulated and only the Thursday files get remotely copied there is a hole in my remote backup still. Looks to me like backup integrity is still ok on the server, but remotely there is still a hole unless it gets fixed in the next transport session.

    Hello @sp3ctre69, It depends on if "Strictly enforce retention, regardless of backup success." is enabled in "WHM >> Backup Configuration". You can read more about the retention behavior at:
    0
  • sp3ctre69
    I am obviously struggling to articulate the problem here... the way the system handles failed backups is fine, the problem is the backups are running fine, and each night I get a "backup succeeded" email. Following that is the transport phase where it sends it to the remote destination... that is the bit that sometimes fails. When doing a full backup a failed transport on Wed would be fixed by a successful transport on Thurs.... but with incrementals the failed Wed would still cause problems... Of course the files are all there on the local file system, but if the remotes are wrong it doesn't help if you need to use them. Does that make more sense?
    0
  • cPanelMichael
    Following that is the transport phase where it sends it to the remote destination... that is the bit that sometimes fails. When doing a full backup a failed transport on Wed would be fixed by a successful transport on Thurs.... but with incrementals the failed Wed would still cause problems... Of course the files are all there on the local file system, but if the remotes are wrong it doesn't help if you need to use them.

    Hello, Would you mind opening a support ticket using the link in my signature so we can take a closer look and see if this is a flaw in how remote incremental backups are transported and retained? Thank you.
    0
  • sp3ctre69
    Hello, Would you mind opening a support ticket using the link in my signature so we can take a closer look and see if this is a flaw in how remote incremental backups are transported and retained? Thank you.

    Thanks... done, ticket number is 8760865
    0
  • sp3ctre69
    My issue is the errors are in the backup transport area. Presumably the backups are correct on the server but incorrect on the remote server. My question is if yesterdays backup is incorrect on the remote will this get corrected on the next transport i.e. Will it rsync all the days or just the current one. Seems that a backup transport error on one day could potentally corrupt your backup integrity unless corrected.

    I have done some investigation on this and I think the system works as I think it should... there is definitely some confusion though even amongst some of the discussions I have had with cpanel staff. Last night I blatted my remote server and re-installed it, just in case there were issues. This left me with 7 days backups on the server and none on the remote. I ran a backup and the result was a full backup on the remote server, which matched the size of the backup for last night on the source. Obviously on the source server the files were linked (hence incremental) but on the remote server there was nothing to link to (as they had been deleted). It has obviously dealt with this by taking them from the source server. Can someone comment on my understanding of this as I just want some comfort of the process it goes through when the previous days backup exists on the source server but not on the remote server when using rsync incremental backups. Thanks
    0
  • sp3ctre69
    Interestingly I am getting timeouts while pruning on 2 of my servers every night. When I look the oldest folder had been pruned. Interestingly these are 2 of my servers with the biggest sites. All timeouts are on max, I wonder if there is another one we are missing? Never had any failed transports when using full backup
    0
  • cPanelMichael
    Interestingly I am getting timeouts while pruning on 2 of my servers every night. When I look the oldest folder had been pruned. Interestingly these are 2 of my servers with the biggest sites. All timeouts are on max, I wonder if there is another one we are missing? Never had any failed transports when using full backup

    I see that ticket number 8762433 is open for this issue. I'll monitor the ticket and update this thread with the outcome.
    I ran a backup and the result was a full backup on the remote server, which matched the size of the backup for last night on the source. Obviously on the source server the files were linked (hence incremental) but on the remote server there was nothing to link to (as they had been deleted). It has obviously dealt with this by taking them from the source server. Can someone comment on my understanding of this as I just want some comfort of the process it goes through when the previous days backup exists on the source server but not on the remote server when using rsync incremental backups.

    The remote incremental backup process (with rsync) is designed to check if the files associated with the account backup on the cPanel server also exist on the remote backup destination. If the files do not exist on the remote backup destination (or if the files have changed), then it copies the actual files. Otherwise, it makes use of hard links when the files already exist on the remote backup destination. The following document is available if you'd like to read more about how account backup information is stored: Metadata for Backups - Version 66 Documentation - cPanel Documentation Thank you.
    0
  • sp3ctre69
    I see that ticket number 8762433 is open for this issue. I'll monitor the ticket and update this thread with the outcome. The remote incremental backup process (with rsync) is designed to check if the files associated with the account backup on the cPanel server also exist on the remote backup destination. If the files do not exist on the remote backup destination (or if the files have changed), then it copies the actual files. Otherwise, it makes use of hard links when the files already exist on the remote backup destination. The following document is available if you'd like to read more about how account backup information is stored:
    0
  • cPanelMichael
    Hello, To update, a couple of internal cases were opened as part of the support ticket. Internal case CPANEL-15398 was opened to ensure that that the operations for the rsync backend use the default timeout value of 300 so that destination servers with extremely slow disks (e.g. disk caching is disabled) can delete directories before timing out. I'll update this thread once the resolution is published. Internal case CPANEL-15309 is open to address an issue where the use of a relative path (e.g. the path does not start with a slash character) as the backup directory for an Rsync transport prevents incremental backups from hard-linking on the remote destination. I'll monitor this case and update this thread with the outcome. In the meantime, the workaround is to update the backup directory path for the rsync destination to the absolute path (e.g. /home/user/path/to/). Thank you.
    0
  • cPanelMichael
    Hello, To update, the resolutions associated with internal cases CPANEL-15398 and CPANEL-15309 are scheduled for inclusion with cPanel version 68. Thank you.
    0
  • brt
    This is a nuisance, with directories being abandoned on a daily basis. Any chance this can get bumped to a 66 update? (Specifically CPANEL-15398)
    0
  • cPanelMichael
    This is a nuisance, with directories being abandoned on a daily basis. Any chance this can get bumped to a 66 update? (Specifically CPANEL-15398)

    There are currently no plans to backport the patch to cPanel version 66, however the following steps are available as a temporary workaround: 1. Open /usr/local/cpanel/Cpanel/Transport/Files/Rsync.pm via the command line in the text editor of your preference. 2. Locate the following line:
    my $res = $self->{'rsync_obj'}->capture( { timeout => 10, tty => 1 }, "rm -rf $path" );
    3. Modify the "10" entry in this line to "300":
    my $res = $self->{'rsync_obj'}->capture( { timeout => 300, tty => 1 }, "rm -rf $path" );
    4. Save the file. Let us know if this helps. Thank you.
    0
  • brt
    This hasn't helped at all. *No* backups on the remote server are deleted (and I'm obviously considering the retention settings -- they're obeyed properly on the primary server). Backups complete just fine, but I have to manually delete previous backups every single time or they just stick around.
    0
  • cPanelMichael
    Hi @brt, cPanel version 68 includes resolutions for both cases referenced in the earlier post. It's now available in the Current build tier, and is tentatively planned for the Release build tier next week. Thank you.
    0
  • uk01
    Hi, The errors described here seem to be the same as we are having. I've tried the fix on the previous page, editing the rsync file with a higher timeout, however the error has occurred again tonight. Preview of transport errors log: Unable to prune transport "xxxxxxx" Error pruning "/home/xxxx-incremental/2017-10-24" from "xxxxxxx": ssh slave failed: timed out I checked and the 2017-10-24 has been deleted even though its telling me it timed out. Is this related to the fix in v68? And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks. [2017-10-31 03:01:29 +0000] warn [cpbackup_transporter] Error pruning /home/xxxx-incremental/2017-10-24 from xxxxxx-rsync: ssh slave failed: timed out at /usr/local/cpanel/Cpanel/LoggerAdapter.pm line 27. Cpanel::LoggerAdapter::warn(Cpanel::LoggerAdapter=HASH(0x1bfa350), "Error pruning /home/xxxx-incremental/2017-10-24 from xxxxxx-"...) called at /usr/local/cpanel/Cpanel/Backup/Queue.pm line 690 Cpanel::Backup::Queue::transport_backup::attempt_to_prune_destination(Cpanel::Backup::Queue::transport_backup=HASH(0x207f450), Cpanel::Transport::Files::Rsync=HASH(0x2a3e488), 7, undef, Cpanel::LoggerAdapter=HASH(0x1bfa350)) called at /usr/local/cpanel/Cpanel/Backup/Queue.pm line 226 Cpanel::Backup::Queue::transport_backup::process_task(Cpanel::Backup::Queue::transport_backup=HASH(0x207f450), cPanel::TaskQueue::Task=HASH(0x2a866b8), Cpanel::LoggerAdapter=HASH(0x1bfa350)) called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 629 eval {...} called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 632 cPanel::TaskQueue::__ANON__() called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/StateFile.pm line 237 eval {...} called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/StateFile.pm line 237 cPanel::StateFile::Guard::call_unlocked(cPanel::StateFile::Guard=HASH(0x26c1418), CODE(0x26dd698)) called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 637 cPanel::TaskQueue::process_next_task(cPanel::TaskQueue=HASH(0x26c01b0)) called at /usr/local/cpanel/bin/cpbackup_transporter line 151 eval {...} called at /usr/local/cpanel/bin/cpbackup_transporter line 149
    0
  • brt
    This seems to be fixed now, functionally, (backups DO seem to be pruning themselves now) but I'm still getting daily emails with the pruning error (Backup Transport Error). Are there any updates on this matter (pruning backups)?
    0
  • cPanelMichael
    Is this related to the fix in v68? And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks.

    The issue you described does appear related to internal case CPANEL-15398. The resolution is included in cPanel version 68 and ensures that the operations for the rsync backend use the default timeout setting so that destination servers with extremely slow disks (e.g. disk caching is disabled) can delete directories. I recommend updating to cPanel version 68 once it's in your build tier (or even sooner if you don't mind switching to the CURRENT build tier) to see if the issue persists.
    And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks.

    The error suggests the backups were not pruned. Are you suggesting some of the backup directories are in-fact pruned on the remote system?
    This seems to be fixed now, functionally, (backups DO seem to be pruning themselves now) but I'm still getting daily emails with the pruning error (Backup Transport Error). Are there any updates on this matter (pruning backups)?

    Is this server already using cPanel version 68? cPanel 68 includes the fix: Fixed case CPANEL-15398: Backups: ensure rsync operations use default timeout. Thank you.
    0
  • brt
    @cPanelMichael - I guess I did mistake two different updates here. "CPANEL-15493: Make sure incremental dirs are removed when asked." is the one that seems to have fixed our remote server filling up. Previous backups were failing to prune prior to that. Now it appears they're pruning correctly, but are still reporting problems pruning.
    0
  • cPanelMichael
    Now it appears they're pruning correctly, but are still reporting problems pruning.

    To confirm, which version of cPanel is installed on this system? Thank you.
    0
  • brt
    To confirm, which version of cPanel is installed on this system? Thank you.

    66 up until *right now* -- the 68 update is currently running.
    0
  • LBJ
    The error suggests the backups were not pruned. Are you suggesting some of the backup directories are in-fact pruned on the remote system? Is this server already using cPanel version 68? cPanel 68 includes the fix: Fixed case CPANEL-15398: Backups: ensure rsync operations use default timeout. Thank you.

    We're running v68.0.21 and on 2 of our servers we're seeing the remote prune being successful, but still receiving the "Unable to prune transport" error. Best regards, LBJ
    0
  • sp3ctre69
    Our system seemed fixed after the update but in he last week it has started doing it again.
    0
  • uk01
    We see the following: Remote destination 1 3 servers running centos 7 work fine 1 server running cloudlinux centos 6 never prunes Remote destination 2 3 servers running centos 7 alert as failed pruning but do actually prune 1 server running cloudlinux centos 6 never prunes
    0
  • LBJ
    On cPanel Stable v68.0.21, the timeout used for the final remote prune appears to be running with 30 rather than 300. The logs, Rsync.pm, and the timing between the primary backup finishing and the error for a prune failure being generated all point to this. For backups of servers with data well in excess of 1/2 TB, a timeout of 30 doesn't always allow for a prune on remote SATA non NVMe/SSD drives to complete. I would imagine most users with substantial data would be using economical non NVMe/SSD drives for backup processes. Best regards, LBJ
    0

Please sign in to leave a comment.