Each instance of Windows Azure Service Role runs its own monitor to gather its own instance specific diagnostic data. The problem that immediately presents itself is knowing what exactly is being collected, where the data is being saved, and how to retrieve it for inspection. The purpose of this blog post is to illuminate these areas a little bit better.
So lets start at the beginning… When you create a new Windows Azure Web Role, Visual Studio will automatically add a boilerplate WebRole.cs file to your project. By default, the OnStart() method of the WebRole is overridden with an implementation that starts the Windows Azure Diagnostic Monitor. By default, Windows Azure will log its own diagnostics, IIS 7.0 logs, plus Windows Diagnostics.
The argument to the static Start method of the DiagnosticMonitor class is the Windows Azure Data Storage connection string located in the ServiceConfiguration.cscfg file.
When the value of the connection string is “UseDevelopmentStorage=true” then the Developer Fabric will use the local Development Storage to simulate storage in the cloud. Of course in staging or production, this string would point to the RESTful data storage endpoint and would contain your Windows Azure Data Storage AccountName and AccountKey.
We can inspect the “wad-control-container” of Blob storage to find the collected diagnostic information. run your favorite Windows Azure Storage exploration tool. In my example, I am using the Windows Azure Storage Explorer from the CodePlex site. You can use this tool to download the container and its contents to your local file-system for further analysis.
We can also augment the diagnostic data collected to include other data sources as well.
Let’s say you’re also interested in capturing failed IIS and ASP.NET requests. You can augment the data that Windows Azure is already capturing by adding a <traceFailedRequest> element to the <system.webServer/tracing> section. Of course you can control the paths of the page(s) to be tracked, and you can set the verbosity to an appropriate tracing level for your circumstance, including filtering the general areas of coverage such as Authentication, Security, etc. An example might look like this:
We can also collect Windows Event Logs by simply adding an XPath expression of the event sources to be captured of the WindowsEventLog.DataSources property located on the configuration object.
It is possible that a hardware or software defect might be causing mysterious or intermittent operating system failures. Fortunately, we can also configure our instances to collect full or partial crash dumps by calling the static EnableCollection method of the Microsoft.WindowsAzure.Diagnostics.CrashDumps type. Passing true to this method will capture complete crash dumps, passing false will collect partial dumps.
Although the path may be slightly more illuminated now, there are still many dark areas beyond our present location. In my opinion, there is still much work to be done in tooling and making this data useable in “real world” scenarios. It is trivial to sift through a dozen or so entries from a single service instance, but it is nearly impossible to imagine the difficulty of finding what you are looking for in the potentially massive data collected by multiple simultaneous service instances running a busy high-volume application. There are several parties working to provide solutions in this space, but no clear leaders at this time.