Dump Files - Additional Information
Chances and Limitations of using System Dumps to understand unexpected system behavior
Background and History
Despite trying very hard to avoid it happening, it seems sadly inevitable that during the development of software, bugs sneak into a product. While some are easy to find and solve, e.g. the product simply and repeatable does not do something that it is supposed to do, others are far subtler or annoyingly unreproducible, and other again are more fatal and cause what is commonly known as a “crash”.
Generally, what causes a “crash” is an application accessing memory which is not accessible or doing something else that the OS deems inappropriate. Both, in more technical terms, called an “exception”.
When an exception occurs, it causes the application to shut down, and the operating system, depending on the version used, will often show a more or less informative message indicating that something went wrong. This is then interpreted by the user as a “crash”.
Unluckily, the information that an application has crashed is rarely enough information for the developer to address and fix the cause. To address this issue the operation system will create what is called a dump file. Depending on some settings, this dump file could contain from a simple execution stack, showing which commands where executed when the crash occurred, to a complete image of the memory used by the application at the time of the crash:
(...) What is a dump file?
A dump file is a snapshot of an app at the point in time the dump is taken. It shows what process was executing and what modules were loaded. If the dump was saved with heap information, the dump file contains a snapshot of what was in the app's memory at that point in time. Opening a dump file with a heap in a debugger is like stopping at a breakpoint in a debug session. Although you cannot continue execution, you can examine the stacks, threads, and variable values of the app at the time the dump occurred. (from the Microsoft MSDN)
A developer, when provided with the information in these dump files, has a much higher chance of locating and understanding the cause of the crash:
(...) Dumps are primarily used for debugging issues that occur on machines that the developer doesn’t have access to. For example, you can use a dump file from a customer's machine when you can’t reproduce the customer's crash or hang on your machine. (from the Microsoft MSDN)
For many years, Microsoft offered developers a service where Windows would collect all dumps files and optionally upload them to a Microsoft server where these files could be downloaded by the developers to analyze them.
Sadly, this service was shut down due to enormous bandwidth and storage requirements, as well as growing privacy concerns.
To compensate for the loss of this service, Microsoft provided developers with the means to control the creation of a dump files from within the application. Up to this, an application did not have any control what-so-ever over this process once a crash had been triggered. While the actual dump files are still written by the OS, the application now has means to control e.g. where the file will be written to and define, in some degrees, application specific messages and behavior.
Usage within the vsmStudio
The vsmStudio will catch any exception and trigger the OS to write a full dump to a file. This file is stored by the OS in the applications logfile/dumps sub-directory. The dump file’s name contains the date and time along with extra information which give the developers means to match the dump files with the used software version and to detect reoccurring issues. Depending on the setup, vsmStudio will ether show a message (in non-production environment) or restart automatically.
Additionally, the vsmStudio has also got means to detect, among others, e.g. endless loops and deadlocks and will also trigger the OS to write dump files for these cases.
As the vsmStudio currently does not implement an automatic notification and upload system, due to most servers having no or inadequate access to the internet, we have to heavily rely on the operators uploading the dump files onto the dump upload server:
http://www.vsmStudio.tv/upload.aspx
Once a dump file has been uploaded it is analyzed by the vsmStudio development team. Upon finding the cause of the dump, measures are put into action to ensure that this crash will not reoccur in future builds.
Limitations
In some cases, analyzing a dump file does not yield a conclusive finding. The reasons for this can be manifold: Firstly, it can happen that dump files are corrupt or contain no data. This may have been caused by the actual dump writing process, which is part of the OS and therefore is out of our control. Secondly, the server on which the dump has been written has an outdated OS. This may cause the debugging service to refer to wrong or missing “symbol files”, which makes stack backtracking impossible. Thirdly, dump files from older, outdated vsmStudio builds have too many differences in source code compared with our current development status, and there is no relation between dumped data and the development code. Finally, some causes of dumps are so obscure that even the most experienced and determined developer cannot identify the original, sometimes even obfuscated by time, cause of the dump.
Summary
Dump files are an essential tool in finding bugs in software and it is vital for the evolution of the software to receive the dump files. Only with this feedback are we able to provide quick feedback, fixes or workarounds. Yet, not every dump file contains enough information to yield a qualified statement.