After we learned about RCW ref count and anonymous methods I can finally explain the GC deadlock bug in our Outlook add-in.
Identifying the bug
Trying to identify an unrelated bug we have noticed that when our Outlook add-in is installed if the user opens an item in new Inspector, i.e. open an item in new window by double clicking it, and then closes it, the Inspector window is hidden but stays open. Where when our add-in is not installed the Inspector window is fully closed and disappears from the Outlook application windows.
This behavior can be seeing using spy++ or better using Windows Detective1, the window with "Test Inspector bug" title is hidden but still alive:
Obviously my first thought was that the Inspector RCW was not released by 'Marshal.ReleaseComObject' so I run 'GC.Collect' with 'GC.WaitForPendingFinalizers' but the issue persisted so it will not go away by itself.
Outlook Inspectors leak is the obvious issue but this leak can cause all sort of weird bug as any Outlook object leak2 so it's important so solve it.
Finding the buggy code
Because tracking down leaking Outlook object is not very easy and I had 100% repro the obvious choice was to start removing code in a binary search style to zero in the lines of code that caused the bug.
Fair enough it took no more than 15 min to find the two, separate, lines of code that together has caused the problem.
1. Oops, forgot 'Marshal.ReleaseComObject'
Part of the add-in functionality is to hide some ribbon buttons in email Inspector window. To accomplish that I have created 'customUI' xml and used it to modify the ribbon using 'IRibbonExtensibility'. Because I was required to control if a specific button in Inspector ribbon was visible depending on the Inspector it is in, I used the 'customUI' xml to subscribe to 'getVisible' event. The signature of ribbon callback passes 'IRibbonControl' COM instance that has a 'Context' property which in this case is an Inspector COM object wrapped in a nice RCW.
So, a long story short, a rookie mistake, I didn't release Inspector object that crossed from COM to .NET and by which it's RCW ref count has been incremented.
Fixing this is as easy as adding 'Marshal.ReleaseComObject', and of course I did it, but I wanted to know why the GC didn't collected the RCW, and this is where the second line of code comes in.
2. Shouldn't the GC have collected it?
Another part of code requires subscription to each Inspector 'Activate' event, so in 'NewInspector' event of 'Inspectors' collection I subscribe to 'Activate' event for each new Inspector.
Because subscribing to event requires the Inspector instance not to be collected by the GC3 I need to hold the Inspector instance while it is open and release it when it closes, so I need to subscribe to the 'Close' event as well. The 'Close' event doesn't passes any arguments so as a shortcut I used anonymous method to pass the inspector to the close method like so:
private void OnNewInspector(Inspector inspector) { inspector.Close += () => OnInspectorClose(inspector); } private void OnInspectorClose(Inspector inspector) { Marshal.ReleaseComObject(inspector); }
And this is the source of the problem. As we learned because the anonymous method uses variable from outer scope a new class is generated with 'this' and Inspector as its fields:
private sealed class u003Cu003Ec__DisplayClass74 { public Handler u003Cu003E4__this; public Inspector inspector; public void u003COnNewInspectoru003Eb__73() { u003Cu003E4__this.OnInspectorClose(inspector); } }
Ok, the anonymous method generated class captures and Inspector object and the event delegate probably holds the generated instance but who is holding that? Next I executed the bug repro steps fired up WinDbg and fair enough the anonymous method class is in memory, 'gcroots' command revealed the following:
DOMAIN(0CE5D9C0):HANDLE(RefCnt):11e51fd0:Root:12cc26e0(Outlook.InspectorEvents_SinkHelper)-> 12cc25a8(Outlook.InspectorEvents_CloseEventHandler)-> 12cc2554(OffiSync.Outlook.Common.OutlookEvents.OutlookEventsHandler+<>c__DisplayClass74)
If we backtrack Inspector object is held by anonymous method instance that is held by close event delegate that is held by a sink helper instance, a part of Outlook PIA4 that handles events mechanism from Outlook interop, and it is been held by the GC reference count handle (probably as part of RCW mechanism).
So the GC couldn't collect the RCW object because it was reachable by the anonymous method subscription to COM event of the same object!
Solution
The simple solution is to create a dedicated class that acts the same as the anonymous method generated class with a single addition, clear the Inspector reference (you can also unsubscribe from the event altogether) to break it from been reachable by the GC.
Actually I went a bit more extreme by writing the COM event handling myself because it's more efficient and a bit simpler for multiple events.
Summary
Inspector RCW object was captured by anonymous method subscription to COM event.
GC holds reference to the target of COM class subscription via its reference count mechanism.
Missing COM release prevented the RCW from deterministically reach zero ref count and removing the holding GC handle.
Because the reference count was not decremented to zero the GC kept the reference chain alive.
It's a nice GC deadlock.
- Most developers are familiar with spy++ when developing complex windows UI application. Window Detective is an open source alternative that is easier to install and has a nice ability to group windows by the process they belong to which is very useful in my scenario. ↩
- Great blog post on this subject can be found here. ↩
- Event subscriptions hold reference from the subscribed event delegate to the target method instance, so without holding in memory the instance that holds the event it will be collected by the GC. ↩
- Actually in this case it's AIA and not PIA as it was not generated by Microsoft but PIA is much more familiar term. ↩