Page MenuHomekolab.org

Compress pendingdata
ClosedPublic

Authored by mollekopf on Jan 7 2021, 4:21 PM.

Details

Summary

pendingdata can potentially be rather large. Because all entries are of
the form $collectionId:$uid, we also have a lot of duplication in the
string.

Large entries are problematic beause with ~50k messages we already reach
the mysql max_allowed_packet size of 1MB. Simply compressing that string
imrpoves the situatin drastically.

We allow decompression to fail for backwards comaptiblity (works because
of decompression checks for the relevant headers).

Diff Detail

Repository
rS syncroton
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mollekopf requested review of this revision.Jan 7 2021, 4:21 PM
mollekopf created this revision.

Tested with up to 200k messages (it failed with 50k before).

I chose compression because it was the simplest solution to get the job done. We don't have the collectionId readily available without without parsing an entry, and the only downside to this approach is that the cache entry is no longer directly readable text.

  1. Did you check/estimate max number of messages we can handle with default 1MB packet?
  2. This requires php-zlib extension which is not a default. We should make sure it's marked as required in packaging, I guess.
  3. I'm not sure pushing compressed data into database is safe (without e.g. base64-encoding it).

Good points.

  1. Did you check/estimate max number of messages we can handle with default 1MB packet?

No, I just tested that 200k works.

  1. This requires php-zlib extension which is not a default. We should make sure it's marked as required in packaging, I guess.

This seems to be enabled by default on centos7 and ubuntu at least, and there is no separate package or library, so there's not much we can do I suppose.

  1. I'm not sure pushing compressed data into database is safe (without e.g. base64-encoding it).

Should be ok since pendingdata is a "longblob".

machniak accepted this revision.Jan 19 2021, 11:06 AM
This revision is now accepted and ready to land.Jan 19 2021, 11:06 AM
This revision was automatically updated to reflect the committed changes.