Page MenuHomekolab.org

Bump collectionChanges even if we failed to retrieve the entry.
ClosedPublic

Authored by mollekopf on Thu, Jan 7, 3:23 PM.

Details

Summary

This ensures that we don't end up processing very large collections in
one go, thus risking running into php max_execution_time limits.
This became a problem with large folders starting to sync but then
getting disabled, which resulted in ~60k messages in the cache that were
no longer available.
Because processing took more than the default 30s execution timeout the
synchronization would get stuck in an endless loop (failing to complete
and then receiving the same sync request).

We also Only store newContentStates if we are actually sending the entry to the
device. Otherwise we will end up with entries that are supposed to be on the
device, but aren't, and thus probably aren't ever cleaned up either.

Diff Detail

Repository
rS syncroton
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mollekopf requested review of this revision.Thu, Jan 7, 3:23 PM
mollekopf created this revision.

Also, maybe we should tackle this from a different angle. If the folder has been disabled why do we even attempt to sync anything from that folder. We should tell the client that a hierarchy changed. So, maybe the problem is somewhere else, and could be fixed in a better way. We of course can still keep this solution as it prevents from timeout issues caused by other factors.

lib/ext/Syncroton/Command/Sync.php
859–860

I think we could be smarter here. For 'added' entries that throw "not found" exception we should not add them to $newContentStates.

mollekopf updated this revision to Diff 5734.Mon, Jan 11, 3:17 PM
mollekopf marked an inline comment as done.
mollekopf edited the summary of this revision. (Show Details)

Addressed comment

mollekopf added inline comments.Mon, Jan 11, 3:18 PM
lib/ext/Syncroton/Command/Sync.php
859–860

That seems useless and potentially harmful indeed.

Also, maybe we should tackle this from a different angle. If the folder has been disabled why do we even attempt to sync anything from that folder. We should tell the client that a hierarchy changed. So, maybe the problem is somewhere else, and could be fixed in a better way. We of course can still keep this solution as it prevents from timeout issues caused by other factors.

You are right that we shouldn't land in this situation in the first place, and from the code it also seems like we wouldn't. We first check if a folder is no longer available, if that is the case we return STATUS_FOLDER_HIERARCHY_HAS_CHANGED
So it seems we can only land in this situation if the folder does exist, but the entries are somehow not available.

Please note I have never managed to reproduce this exact situation, that is entirely from the logs of https://bifrost.kolabsystems.com/T404807

Potential causes could be:

  • The actual messages have been removed meanwhile (but the folder remains available)
  • Disabling and re-enabling the folder somehow makes the stored identifiers invalid (shouldn't though since they are deduced from content I think?)
  • Disabling the folder is somehow different from removing the folder, so the missing folder checks don't work.

In any case, I think we should apply this patch even if just for the first case, and in that case we can't really optimize it beyond trying each individual entry, except this:

  • Process removals first
  • Remove any removed messages from the added/changed set
  • Process modified second
  • Remove any messages that are in the added set already

But that sounds like a separate patch.

machniak accepted this revision.Tue, Jan 19, 11:07 AM
This revision is now accepted and ready to land.Tue, Jan 19, 11:07 AM
This revision was automatically updated to reflect the committed changes.