Kontact stops responding to any modifications until akonadi is restarted (but keeps working other than that)
Open, HighPublic8 Story Points

Description

It apparently happens that kontact stops responding to any modifications, so i.e. marking a mail as read would no longer result in any reaction.
The application keeps responsive, so you can look at mail etc, but it seems no modifications get executed.

Sometimes akonadi recovers by itself, which results in a storm of modifications getting executed at once, sometimes it doesn't.

From initial debugging output it looks like read-only communication keeps working, but that has to be clarified.

https://support.kolabsys.com/provider?Action=AgentTicketZoom&TicketID=21644&ArticleID=77672&ZoomExpand=0#77672

mollekopf created this task.EditedJun 22 2016, 3:27 PM

One theory would be that the notification mechanism breaks down, so regular fetches etc. keep working, but kontact no longer reacts to changes.

...but that would only make sense if a restart of kontact would fix it.

mollekopf raised the priority of this task from 40 to High.EditedJun 22 2016, 3:33 PM

As a workaround it was suggested to restart akonadi on kontact start, or with a menu option, so users have at least a possibility to fix the issue themselves without restarting their system.

mollekopf set the point value for this task to 8.Jun 27 2016, 11:40 AM

https://support.kolabsys.com/provider?Action=AgentTicketZoom&TicketID=21655&ArticleID=77849&ZoomExpand=0#77849

The information provided here suggests that the communication to the akonadi server remains functional (we can see the ASAP communication for the mark as read operation),
but for some reason the UI doesn't update.

mollekopf moved this task from Backlog to Doing on the Sprint 201626 board.Jun 27 2016, 4:32 PM
mollekopf claimed this task.
mollekopf moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 28 2016, 9:56 AM

This is what we get in the debugger in the error state when trying to mark a mail as read:

kontact-1644948879 (0x7f0cb8012b60) 7421 UID STORE 86696 NOREV (+FLAGS (\SEEN)) 
kontact-1644948879 (0x7f0cb8012b60) 7421 OK DATETIME "23-Jun-2016 11:36:58 +0000" STORE completed

And this is what we expect:

165 UID STORE 3387715 NOREV (+FLAGS (\SEEN)) 
* 3387715 FETCH (REV 4) 
165 OK DATETIME "30-Jun-2016 07:10:55 +0000" STORE completed

As we can see the fetch response is missing. This indicates that akonadi failed to query for the item, which in turn could hint at a broken connection to the mysql server.

We could be chasing the wrong problem:
If the mail that this was tested with was internally already marked as read, then addFlags may have set flagsChanged to false, which would keep the changes empty, which would keep us out of the branch that sends the FETCH response, which makes the whole argument moot.

petersen moved this task from Backlog to Doing on the Sprint 201627 board.Jul 4 2016, 7:15 AM
petersen moved this task from Backlog to Doing on the Sprint 201628 board.Jul 11 2016, 9:10 AM
petersen moved this task from Backlog to Doing on the Sprint 201630 board.Jul 25 2016, 9:10 AM
petersen moved this task from Doing to Backlog on the Sprint 201630 board.
petersen moved this task from Backlog to Doing on the Sprint 201630 board.

From the latest information provided (otrs 07/20/2016 13:03:24):

  • The UID STORE +FLAGS command is successfully executed and we can see the FETCH (REV 2) response)
  • The Job Tracker also reports the ItemModifyJob as successful
  • No notifications at all are visible in the notification monitor
  • The "INSERT INTO PimItemFlagRelation ..." command is successfully executed as recorded by the querydebugger.

While we see some activity from the resource from a synchronization, there is none whatsoever in relation to the modification.
Given that this also seems to be ~30s later (derived from "kontact-1260217093 (0x7f3fe4012b20) 30945 OK DATETIME "20-Jul-2016 10:52:10 +0000" STORE completed "),
I assume this is unrelated.

This seems to indicate that the notification path is completely broken (to akonadiconsole, the resource and kontact).

The current hypothesis is:

  • The DataStore does a reestablishConnectionIfLost because we lost the connection to the mysql server while a transaction was ongoing.
  • m_transactionLevel is never reset to 0 so remains > 0
  • commitTransaction get's m_transactionLevel > 1 and thus never emits transactionComitted
  • NotificationCollector::dispatchNotification is always in transaction and thus always enqueues new notifications, and never emits them since it's waiting on transactionCommitted.

DataStore is a thread-local singleton that also instantiates NotificationCollector which would explain why this behavior is observed only for the session of kontact.
It doesn't explain why a restart of kontact isn't enough to fix the situation though it seems?

The current hypothesis is:

  • The DataStore does a reestablishConnectionIfLost because we lost the connection to the mysql server while a transaction was ongoing.
  • m_transactionLevel is never reset to 0 so remains > 0
  • commitTransaction get's m_transactionLevel > 1 and thus never emits transactionComitted
  • NotificationCollector::dispatchNotification is always in transaction and thus always enqueues new notifications, and never emits them since it's waiting on transactionCommitted.

    DataStore is a thread-local singleton that also instantiates NotificationCollector which would explain why this behavior is observed only for the session of kontact. It doesn't explain why a restart of kontact isn't enough to fix the situation though it seems?

That hypothesis is flawed because:

  • DataStore doesn't survive over different connections, so for this to be the actual cause it would have to be fixable by restarting kontact (Which would close the session and start a new one on the next start).
  • m_transactionLevel doesn't necessarily refer to any storage transactions directly, it's rather a connection state. As the reconnect to the mysql server happens transparently, the connection will still receive N COMMIT or ROLLBACK commands which should bring m_transactionLevel down again.

Given that we do see some unrelated sync operations including non-SILENT modifications in the log, but still no notifications in akonadiconsole we can conclude that this is a problem where the akonadiserver fails to send any notifications to any process.

This probably means that it is either a fault inside NotificationManager (the class that centrally receives all notifications in the akonadi server), or a problem of the dbus interface that is supposed to notify all clients.

Some additional checks that could be done:

This should return a couple of results:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.subscribers | grep kontact

This should succeed:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.DBus.Peer.Ping

Akonadiconsole subscribes in notificationmodel.cpp directly to dbus, so it's unlikely that this is the reason.

This leaves us with:

  • dbus failing (This should be detectable with qdbus)
  • NotificationManager either not emitting the notifications or not getting them in the first place
    • appendAndCompress could for some reason drop the notifications.
    • the timer could never fire for some reason.

We probably need some facilities in NotificationManager that allow for introspection (of especially mNotifications to see if notifications are piling up there)

appendAndCompress only optimizes collection modifications, so that's not it.

mollekopf added a comment.EditedJul 28 2016, 6:07 PM

With the latest version it is now possible to introspect the notificationManager with:

Get a counter of all notifications (allows to detect wether the notification manager receives any notifications at all:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.notificationCounter

Get the number of currently stored notifications (allows to see the number of notifications stuck in the queue):

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.notificationCount

Get all subscribe/unsubscribe calls as debug message:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.debugMessages

List all notifications currently queued:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.listNotifications
mollekopf moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

From the following debug output that was captured after the problem appeared, we can see that the notifications indeed seem to pile up in the NotificationManager for some reason.

The output was produced with the given commands, and a try to mark a mail as read in the middle.

Two notifications are expected for a mark as read operation (This is from my local system that doesn't show the problem):

NotificationManager::notify ( Items ((2,1,message/rfc822)) in collection 23 added flags (\SEEN) removed flags () )
NotificationManager::notify ( Items ((2,1,message/rfc822)) in collection 23 modified parts (ATR:MDNStateAttribute) )

The captured debug output:

$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.notificationCounter
35772
$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.notificationCount
27
$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.debugMessages
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_newmailnotifier_agent_6661_xnoz0K
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_jsgVQL
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_EPaixr
Fr. Jul 29 09:05:28 2016 SUBSCRIBE korgac_6607_cNbc2t
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Vi8eHi
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Z3YgcG
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Jg9Apf
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_HUZW5D
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_VJHVzE
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_VDb6uY
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_vNF3zU
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_lZXID9
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_YUoffd
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_GkulUX
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6632_Yb7eC6
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_49vCnL
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_CsfUKs
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_25m8x8
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_pphbDs
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_1ECvv1
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_Zpm2W9
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_Z4dhsz
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_ldap_resource_0_6640_BKuNHi
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_6UHP7W
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_pV588G
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6633_CjJ5qt
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_migration_agent_6658_TqDTLb
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6627_GbQRVp
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_baloo_indexer_6631_x91BE8
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_sendlater_agent_6662_70UCYr
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_B4QlZx
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6646_wlt5AD
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_kolab_resource_0_6634_j45K9O
Fr. Jul 29 09:05:34 2016 SUBSCRIBE kontact_6569_ykBad6
Fr. Jul 29 09:10:14 2016 SUBSCRIBE akonadi_newmailnotifier_agent_6661_TVYUmc
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_ULcwaO
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_7gl9Le
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_DkU1dj
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_RMUX0F
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_ZLvm5Q
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_6Euwmy
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_mCduJT
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_3BN9PF
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_3ITpV4
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_XvWa4r
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_umnx4G
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_CQejJ8
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_GceAOR
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_oO80om
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_qch6DL
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_MPDAnL
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_nbTdCA
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_yc2Fbt
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_lu3dsw
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_mWCPAJ
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_Tq50kE
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_50l3db
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_Taf6A5
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_6uqhU8
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_jylrFO
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_Iz1ZU8
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_Cv34ew
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_qmd2Ot
$ echo " mark mail as read....."
 mark mail as read.....
$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.notificationCounter
35774
$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.notificationCount
29
$ qdbus org.freedesktop.Akonadi /notifications 
org.freedesktop.Akonadi.NotificationManager.debugMessages
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_newmailnotifier_agent_6661_xnoz0K
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_jsgVQL
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_EPaixr
Fr. Jul 29 09:05:28 2016 SUBSCRIBE korgac_6607_cNbc2t
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Vi8eHi
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Z3YgcG
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_Jg9Apf
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_HUZW5D
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_VJHVzE
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_VDb6uY
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_vNF3zU
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_lZXID9
Fr. Jul 29 09:05:28 2016 SUBSCRIBE kontact_6569_YUoffd
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_GkulUX
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6632_Yb7eC6
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_49vCnL
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_CsfUKs
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_maildispatcher_agent_6647_25m8x8
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_pphbDs
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_1ECvv1
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_Zpm2W9
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_archivemail_agent_6628_Z4dhsz
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_ldap_resource_0_6640_BKuNHi
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_6UHP7W
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_pV588G
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6633_CjJ5qt
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_migration_agent_6658_TqDTLb
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6627_GbQRVp
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_baloo_indexer_6631_x91BE8
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_sendlater_agent_6662_70UCYr
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_mailfilter_agent_6650_B4QlZx
Fr. Jul 29 09:05:28 2016 SUBSCRIBE AkonadiAgentServer_6646_wlt5AD
Fr. Jul 29 09:05:28 2016 SUBSCRIBE akonadi_kolab_resource_0_6634_j45K9O
Fr. Jul 29 09:05:34 2016 SUBSCRIBE kontact_6569_ykBad6
Fr. Jul 29 09:10:14 2016 SUBSCRIBE akonadi_newmailnotifier_agent_6661_TVYUmc
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_ULcwaO
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_7gl9Le
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_DkU1dj
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_RMUX0F
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_ZLvm5Q
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_6Euwmy
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_mCduJT
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_3BN9PF
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_3ITpV4
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_XvWa4r
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_umnx4G
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_CQejJ8
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_GceAOR
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_oO80om
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_qch6DL
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_MPDAnL
Fr. Jul 29 09:55:21 2016 SUBSCRIBE kontact_6569_nbTdCA
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_yc2Fbt
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_lu3dsw
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_mWCPAJ
Fr. Jul 29 11:41:13 2016 SUBSCRIBE kontact_6569_Tq50kE
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_50l3db
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_Taf6A5
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_6uqhU8
Fr. Jul 29 12:03:52 2016 SUBSCRIBE kontact_6569_jylrFO
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_Iz1ZU8
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_Cv34ew
Mo. Aug 1 09:02:53 2016 SUBSCRIBE kontact_6569_qmd2Ot

The above output only shows 3 SUBSCRIBE commands instead of the usual 4 in previous instances....,
and it never shows an UNSUBSCRIBE command. Needs to be investigated.

We have not UNSUBSCRIBE messages, because kdepimlibs never triggers them. I added now a patch to fix this. See Differntial

knauss moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Aug 1 2016, 1:23 PM

Add a method to trigger the notifications by hand, it first displays the current status of the timer:

qdbus org.freedesktop.Akonadi /notifications org.freedesktop.Akonadi.NotificationManager.triggerTimer
The timer is currently inactive
The intervall is currently 50
petersen moved this task from Backlog to Doing on the Sprint 201632 board.Aug 8 2016, 8:06 AM

waiting for input from genua side

knauss moved this task from Doing to Backlog on the Sprint 201632 board.Aug 10 2016, 10:29 AM