Developing Outlook add-in is hard

I'm a little swamp so it's hard to find time to write a blog post, but recently I have worked on an issue that's worth the time to blog about.
Here's a frustrating yet optimistic example of why writing code inside a third party complex system is hard.
 

Started with a hang

I have received a report that for one of the users Outlook froze with 100% CPU spike after about 3 days of regular usage.
Fortunately I was able to get two memory dump of outlook process at that time, but what I saw was not really encouraging.
The first one is disappointing. Using '!runaway' command I know that the thread of interest is 0 – the main thread of outlook:

0:000> kb      
ChildEBP  RetAddr  Args to Child                    
002efd60  759b06eb 002efe2c 00000000 00000000 user32!NtUserPeekMessage+0x15      
002efd88  759b60dc 002efe2c   00000000 00000000 user32!_PeekMessage+0x88      
*** ERROR:  Symbol file could not be found.  Defaulted to export symbols for MSO.DLL -       
002efdb4  5b1d91fa 002efe2c 00000000 00000000 user32!PeekMessageA+0x143      
*** ERROR:  Symbol file could not be found.  Defaulted to export symbols for OUTLOOK.EXE -       
WARNING: Stack  unwind information not available. Following frames may be wrong.      
002efddc  2f5ef04d 002efe2c 00000000 00000000 MSO!Ordinal7300+0x29      
002efe04  2f6d032d 002efe2c 00000000 00000000 OUTLOOK!StdCoCreateInstance+0x73bec      
...      

0:000> dd  002efe2c       
002efe2c  00151a6a 00000456  00161bd0 00000000      
002efe3c  27ef61c0 0000076f 00000074 00000001      

 
The main thread is just pumping the message loop and the current message is 0x456 which means it's a custom message, so not much to go with here…
 
The second memory dump is a little more interesting:

0:000> kb      
ChildEBP  RetAddr  Args to Child                    
002eeadc  759a61cf 75a10230 00000001 759a6cee kernel32!InterlockedIncrement+0xd      
002eeae8  759a6cee 863e9520 312d6550 312d6548 user32!_BeginIfHookedUserApiHook+0xf      
002eeb4c  759a77c4 00000000 30462bf4 00151a6a user32!UserCallWinProcCheckWow+0x6f      
002eebac  759a788a 30462bf4 00000000 002eebcc user32!DispatchMessageWorker+0x3bc      
002eebbc  304d5f08 312d6548   312d6548 002eec00 user32!DispatchMessageW+0xf      
WARNING: Stack  unwind information not available. Following frames may be wrong.      
002eebcc  30d487dd 312d6548 00000000 00000000 WWLIB !FMain+0xd1961      
002eec00  30a24569 002efaac 00000000 00000001 WWLIB!wdCommandDispatch+0x3d346a      
002efaa4  30bfcaf7 0000010f 00000056 00000000 WWLIB!wdCommandDispatch+0xaf1f6      
...      

0:000> dd  312d6548       
312d6548  00151a6a 00000456  00360c04 00000000      
312d6558  27f1c053 00000814 00000033 00000000      

 
Here we can see the main thread is pushing a message into the message loop, and it's the same message code! So my assumption was that the 100% CPU is caused by a loop between OUTLOOK and WWLIB playing around the message loop. Investigating into who is WWLIB revealed that it's the word rich text editor that outlook is using in it's inspectors (like new email windows, post item, etc.) and looking at the logs I realized that the hang occurred after an inspector window was closed, but not right after, another inspector window needs to be open for the hang to manifest.
We have a lead!
 

Continued with a crash

A day or two after the hang report our QA had managed to crash Outlook by performing an operation that resulted in closing the active inspector window (I'm jumping a little as the first analysis is not very interesting). A quick look into "Event Viewer" showed a familiar faulting module 'WWLIB' but process crash dump revealed it was "Access Violation" so no much help there.
But the important part, that changed everything, it was reproducible!
So I started playing around to pin-point what was causing the crash and not very surprisingly I have managed to reproduce the hang as well. Not sure what exactly causes the hang to became a crash, it has something to do with the folder or the item currently selected, but it's not really the point here.
 

Analysis

After a short time I stripped the code to the bare minimum that still resulted in the hang:
1. Outlook 2007 (2003 doesn't have WWLIB, 2010 doesn't repro)
2. Have process wide keyboard hook using 'SetWindowsHookEx' [ref 1] [ref 2]
3. Hook Ctrl+Enter keyboard combination event to close the current active inspector AND don't continue the event (CallNextHookEx)
 

Repro steps:
1. Open outlook
2. Open new mail inspector
3. Set focus on the body word rich text editor of the new mail
4. Type 'a' and QUICKLY hit Ctrl+Enter
5. Open another new mail inspector (not really important, just for completeness)
 

Points:
1. It happens only with Ctrl+Enter, hooking any other key combinations does not result in hang.
2. It happens only if before pressing Ctrl+Enter there was a keyboard event that resulted in change to the characters appearing in the editor (adding/deleting), but the delay between must not be above 2-3 sec, after that everything fine.
3. The code must not propagate the keyboard event to the next handler, if does it will not result in hang.
 

Solution

On Ctrl+Enter keyboard event in the hook don't close the inspector directly but delay its closing by posting the close command to the main thread after a short delay, meanwhile propagate the keyboard event to the next hook to let the word control 'handle' it.
 

Conclusions

  1. Couldn't have identified the real reason for the issue without a consistent repro.
  2. It's very important to understand what happened before process hang/crash as it's not always textbook divide by zero error.
  3. Having memory dumps didn't really help to identify the core issue but helped to verify it was related.
  4. The code, basically, does nothing wrong, there is no way a developer could have prevented it.
  5. Developing Outlook add-in is hard.
     

Sources:

  1. Windows Hooks in the .NET Framework
  2. Using Hooks from C#

2 comments on “Developing Outlook add-in is hard

  1. […] * Outlook API has limited support for keyboard handling, implementing process level hook for keyboard messages is the best option we found though is also can be dangerous. […]

  2. […] Outlook API has limited support for keyboard handling, implementing process level hook for keyboard messages is the best option we found though is also can be dangerous. ↩ […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s