Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Exchange backup fails if we can't enumerate service folders #3716

Closed
ashmrtn opened this issue Jun 29, 2023 · 6 comments · Fixed by #4087
Closed

[Bug]: Exchange backup fails if we can't enumerate service folders #3716

ashmrtn opened this issue Jun 29, 2023 · 6 comments · Fixed by #4087
Assignees
Labels
backup bug Something isn't working

Comments

@ashmrtn
Copy link
Contributor

ashmrtn commented Jun 29, 2023

What happened?

The case I investigated we got a 404 when attempting to get the user's calendars. This eventually caused the backup in CI to fail when merging backup details with the following error

tester.go:39: TestBackupOpIntegrationSuite/TestBackup_Run_incrementalExchange/delete_a_folder run at 2023-06-28T23:04:14.19626645Z
      backup_integration_test.go:1273: 
          	Error Trace:	/home/runner/work/corso/corso/src/internal/operations/backup_integration_test.go:1273
          	            				/home/runner/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:112
          	Error:      	Received unexpected error:
          	            	empty location key
          	            		/home/runner/work/corso/corso/src/internal/kopia/merge_details.go:81
          	            	getting new paths
          	            		/home/runner/work/corso/corso/src/internal/operations/backup.go:637
          	            	getting updated info for entry
          	            		/home/runner/work/corso/corso/src/internal/operations/backup.go:747
          	            	merging details
          	            		/home/runner/work/corso/corso/src/internal/operations/backup.go:357
          	            	running backup
          	            		/home/runner/work/corso/corso/src/internal/operations/backup.go:226
          	Test:       	TestBackupOpIntegrationSuite/TestBackup_Run_incrementalExchange/delete_a_folder

Fix

Always attempt to parse the location info from the old entry prior to lookup. Even for older backups we should have the invariant that either the LocationRef is non-empty or the RepoRef contains enough information to generate the old LocationRef

Priority of this is a bit unclear. Technically the backup should still fail overall because we couldn't backup something (calendars in this case), but properly handling this case can lead to better error messages in the long run and better best-effort behavior

Analysis

Connecting to the repo and dumping the details for the backup immediately before the failing one shows that the details entry has all the expected information. However, we appear to be failing on getting the information when backing up info. Not being able to get the info causes us to fail to merge later because we don’t have enough to do the lookup

Inspecting the logs for the failing backup shows a 404 when getting calendars (info trimmed and formatted for legibility)

2023-06-28T23:04:24.537Z  ERROR graph/middleware.go:176 graph api error: 404 Not Found  {
	"service": "Exchange",
	"category": "events",
	"resp_status_code": "404",
	"resp_content_len": "-1",
	"resp_status": "404 Not Found",
	"method": "GET",
	"url": "https://graph.microsoft.com/v1.0/users/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee/calendars?$select=id,name&$skip=10&$top=10",
	"request_content_len": "0",
	"response": "HTTP/1.1 404 Not Found
{
	\"error\":{
		\"code\":\"ErrorItemNotFound\",
		\"message\":\"The specified object was not found in the store.\",
		\"innerError\":{
			\"date\":\"2023-06-28T23:04:24\",
			\"request-id\":\"37278584-159f-4ce3-b91c-dc0db4b55ae6\",
			\"client-request-id\":\"04ab5101-d45e-4d7d-99a4-b3249058436c\"
		}
	}
}"
}

The rest of the backup appears to proceed normally from there, but the backup still fails overall during details merging. The root cause is the assumption that all exchange folders will generate collections even if they aren’t changed. Having a collection allows us to get the location of the folder even if it’s merged. The location is then passed to streamBaseEntries and added as part of the information that’s eventually propagated to the merge handler when FinishedFile is called.

When looking up items for merging, we don’t attempt to extract location from Exchange right now. This leads to us calling GetNewPathRef with an empty old location. Overall, it causes the if-block in the lookup code to return an error because the information that we found has no location and no location was passed in for the lookup.

The merge code is currently setup to not attempt parsing a LocationRef in all cases. I don’t believe there’s really a reason not to attempt it. It should be possible to define the set of backups for which there will exist a LocationRef or from which we can parse the location from the RepoRef, much like we’re doing for OneDrive and SharePoint. That alone should solve the issue.

I believe the parsing checks would go something like the following:

  • calendar
    • < backup version 2 -> parse from RepoRef
    • = backup version 2 -> source from LocationRef

  • email and contacts
    • attempt to parse from LocationRef
    • if LocationRef is empty attempt to parse from RepoRef

The above is based on the fact that for email and contacts the following changes were made:

This means that at all times either the LocationRef or the RepoRef for email and contacts should contain what we want for the LocationRef for unchanged items. Furthermore, since what will be used as the LocationRef is properly formatted (i.e. proper directory structure) when found in the RepoRef we don’t have to worry about additional code there. Also, no contacts are stored directly in the contacts prefix folder of the kopia hierarchy, so we shouldn’t have to worry about the LocationRef ever being empty unexpectedly

We could probably just treat calendars the same as well as either LocationRef will be empty or populated and I don't believe there was a time in-between where we had IDs in the RepoRef but empty LocationRef. Nor will events ever appear outside a calendar in the folder hierarchy

Corso Version?

any after ~May

Where are you running Corso?

N/A

Relevant log output

No response

@ashmrtn ashmrtn added the bug Something isn't working label Jun 29, 2023
@sub205
Copy link

sub205 commented Jun 30, 2023

I'm having a similar problem at one customer.
Backups of most (not all) exchange mailboxes fail after a long time.
Can't figure out if it's related to throttling or this error.

2023-06-30T03:54:12.566+0200 ERROR backup/backup.go:263 Backup 1 of 1 failed {"error": "[email protected]: running backup: partial success: 17866 errors occurred", "errorVerbose": "running backup: partial success: 17866 errors occurred\n\t/home/runner/work/corso/corso/src/internal/operations/helpers.go:40\[email protected]\n\t/home/runner/work/corso/corso/src/cli/backup/backup.go:235", "error_labels": {}, "resource_owner_name": "[email protected]", "resource_owner_id": "585459f9-a212-4976-a464-fbed112fd53d"}
2023-06-30T03:54:13.091+0200 ERROR cli/cli.go:181 cli execution {"error": "1 of 1 backups failed:\n· [email protected]: running backup: partial success: 17866 errors occurred\n", "errorVerbose": "1 of 1 backups failed:\n· [email protected]: running backup: partial success: 17866 errors occurred\n\n\t/home/runner/work/corso/corso/src/cli/backup/backup.go:267", "error_labels": {}}
(END)

Unfortunately, not even the partial backups are saved so i need to restart from the beginning everytime.

@ashmrtn
Copy link
Contributor Author

ashmrtn commented Jul 3, 2023

hi @sub205, sorry to hear you're having trouble getting backups to complete

Unfortunately the log output above doesn't give much insight on the cause of the error, but checking the Corso logs should have more helpful information. The location of the log for an individual run is printed by Corso when it runs. You can send the Corso log to a specific location by passing the --log-file <location> flag to Corso (where <location> is where you want the log to go). Log files are appended to if Corso is given the same file for multiple runs. The --mask-sensitive-data and --log-level <level> flags may also be useful to make posting log output for debugging easier and to get more info on what's going wrong

If you have more log output you'd like help interpreting you can contact us on GitHub or on the Corso discord server if you'd like more immediate responses

Unfortunately, not even the partial backups are saved so i need to restart from the beginning everytime

yes, this is an unfortunate consequence of how incremental backups are implemented. We have a ticket open for improving this behavior so that only items that weren't already backed up on a previous attempt are pulled on the next attempt. We haven't yet had a chance to get around to those though

@ashmrtn
Copy link
Contributor Author

ashmrtn commented Jul 21, 2023

hi @sub205,

Just checking in to see if there's anything else you need help with on the issue you reported. Sorry for the long delay in checking back in, this got lost in the stack

@sub205
Copy link

sub205 commented Jul 24, 2023

Hi @ashmrtn ,
have been very busy with other projects, too.
Thanks for looking into this!
The problem still persists and is relatively urgent.
I'll provide you with the required logs later that day.
What's the best way to send them to you?

@sub205
Copy link

sub205 commented Jul 24, 2023

Even with mask-sensitive-data i get the original mailaddress/account id, tenant-id and other stuff in the logs :(
What log-level do you need?

@ashmrtn
Copy link
Contributor Author

ashmrtn commented Jul 24, 2023

alright, I've made a ticket to get better coverage when using the --mask-sensitive-data flag to make sure we don't lose track of it

Debug level logs would be the most useful as they contain a lot more info about what's going on. Logs can either be shared on github (gist/comment/etc), emailed (my email is ashmrtnz (at) alcion.ai, obscured slightly in the hopes of better avoiding scrapers), or DM'ed to one of the Corso devs on the Corso discord. On discord the corso team has the team corso role

@ashmrtn ashmrtn self-assigned this Aug 21, 2023
@ashmrtn ashmrtn added the backup label Aug 21, 2023
aviator-app bot pushed a commit that referenced this issue Aug 23, 2023
Shuffle around some logic for details merging so that
we always attempt to extract a LocationRef from the
backup base entry that's currently being examined.

A LocationRef should always be available from either
the LocationRef field in the details entry (newer
backups) or by extracting it from the RepoRef (older
backups)

Manually tested incremental backups for exchange in
the following scenarios:
1. v0.3.0 backup (calendars use IDs in RepoRef) ->
   incremental with this patch -> incremental with
   this patch
1. v0.2.0 backup (exchange uses folder names in
   RepoRef) -> incremental with this patch ->
   incremental with this patch

The above tests should cover the cases where:
* base backup details don't have LocationRef for
  exchange items
* base backup details have LocationRef for exchange
  items

---

#### Does this PR need a docs update or release note?

- [ ] ✅ Yes, it's included
- [ ] 🕐 Yes, but in a later PR
- [x] ⛔ No

#### Type of change

- [ ] 🌻 Feature
- [ ] 🐛 Bugfix
- [ ] 🗺️ Documentation
- [ ] 🤖 Supportability/Tests
- [ ] 💻 CI/Deployment
- [x] 🧹 Tech Debt/Cleanup

#### Issue(s)

* closes #3716

#### Test Plan

- [x] 💪 Manual
- [x] ⚡ Unit test
- [ ] 💚 E2E
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backup bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants