Finding memory bugs in Win32 applications with Valgrind

DRAFT
Dan Kegel
November 2009
Google

Dynamic Analysis

Writing good software is hard. Programmers must always be on the lookout for bugs. One of the many techniques developed to help programmers do this is called dynamic analysis. In dynamic analysis, programs are run in a special environment that watches for flaws such as invalid pointers, undefined values, or race conditions.

The classic dynamic analysis tool is Purify. It's good, but expensive. The free software world developed a similar tool, Valgrind, which is also good, but the windows port is not yet ready.

However, Linux can run Windows apps via the Wine compatibility layer, and Valgrind is compatible with Wine. This article investigates whether Valgrind+wine can actually replace some of the uses of Purify.

Tools

Valgrind needs a patch or two to work well with Wine; http://wiki.winehq.org/Valgrind describes how to build it.

Wine needs to be built after installing Valgrind so it can use Valgrind's macros to decorate its heap operations -- and the /usr/include/valgrind/valgrind.h file must be new enough to contain the lines

          /* Wine support */
          VG_USERREQ__LOAD_PDB_DEBUGINFO = 0x1601
This happens automatically if you install valgrind with --prefix /usr, but since I prefer to install it into /usr/local/valgrind-NNNN, I usually do
          cd /usr/include; sudo mv valgrind valgrind.orig; sudo ln -s /usr/local/valgrind-NNNN/include/valgrind valgrind

You can check that this was done properly by verifying that wine's include/config.h sets HAVE_VALGRIND_MEMCHECK_H, and that valgrind gives good symbolic stacks for errors in win32 code.

Once you have Wine installed, run Wine's conformance tests to verify that is operating properly. (A few test failures are ok; see http://test.winehq.org for current test results from other users.)

Once both Valgrind and Wine were installed and working, I ran Wine's conformance tests under Valgrind as described at http://wiki.winehq.org/Valgrind to see what kind of background warnings to expect from wine and the operating system under Valgrind. This takes about five hours. You can see my results at http://kegel.com/wine/valgrind/logs/ for comparison.

I put together a suppression file for uninteresting warnings at http://winezeug.googlecode.com/svn/trunk/valgrind/valgrind-suppressions and filed bugs against Wine for a number of these problems; see http://bugs.winehq.org/buglist.cgi?quicksearch=valgrind .

With the tools more or less working properly, I moved on to trying them out on a real win32 project.

Chromium, Valgrind, and Win32

When the Chromium project needed to add dynamic analysis to its continuous build and test rig, it used Purify on Win32, and Valgrind on Linux and Mac. To investigate whether Chromium could use Valgrind for Win32, I wrote two scripts: one to extract the tests, and one to run them.

Building and Extracting Tests

Here are the steps I follow to bring up the test environment:
  1. Set up Chromium build environment
    http://dev.chromium.org describes how to build Chromium on Windows. It takes about a day to install Visual Studio and the needed SDKs and hotfixes, and sadly, there is no easy way to automate this.
  2. Do a debug build of the Chrome solution
    (This takes about 500 minutes on a 1GB Core 2 Duo, 80 minutes on the same machin with 4GB RAM, and about 20 minutes on an 8 core box with 12 GB of RAM and two hard drives.)
  3. Run the tests on Windows with the cygwin shell script http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-runtests.sh . The results are placed in the logs subdirectory.
  4. Check the results of the script
    If any tests misbehave, add lines to the list of expected failures in the script with the following tags: (You can see the list with "sh chromium-runtests.sh --list-failures".)
  5. Package the tests
    Run http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-savetests.sh . This saves a sizeable subset of Chromium's test suite (roughly everything but the layout and ui tests) together with .pdb files into an archive file. (I have a prebuilt copy at http://kegel.com/wine/chromium/chromium-tests.tar.bz2 for people who don't want to set up a chromium build environment.)
  6. Verify the archived tests work properly on Windows
    Unpack that archive in a different directory of the windows machine, run the tests again, and verify that no tests fail. (If they do, either there's a flaky test that needs to be fixed or disabled, or the script that archives the tests is buggy and needs fixing.)
  7. Verify the archived tests work properly on Linux
    Copy the archive to the Linux system where valgrind and wine are installed. (Because that script directly references the valgrind suppressions file mentioned above, it's easiest to grab a copy of the winezeug tree with the command "svn checkout http://winezeug.googlecode.com/svn/trunk/ winezeug", then cd to winezeug/testcases/chromium and unpack the tarball there.)
    Then run "chromium-runtests.sh" again to see how the tests did on Wine (without Valgrind).
  8. Check the results of the script
    Triage any misbehaving tests by adding lines to the list of expected failures with the following tags: If any problems turn out to be bugs in either the test scripts or in the Chromium testcases, fix those (or file a chromium bug). If any seem to be Wine's fault, file bugs against Wine for those, and add URLs for the Wine bug reports to the tags; "sh chromium-runtests.sh --list-failures-html" outputs the expected failure list in HTML with hyperlinks to the bug reports.
  9. As the Wine community fixes problems, remove the corresponding lines from the list of expected failures.

Using the tools

Finally, with the tools installed and tests validated, I then ran the tests under valgrind with the command "sh chromium-runtests.sh --valgrind". It takes about two hours to run this subset of tests under Valgrind on a Core 2 Duo -- unless there's a crash; handling crashes takes up about two gigabytes of RAM and many minutes (the pdb reading code of both wine and valgrind is relatively inefficient).

The Chromium team runs Valgrind on the Linux and Mac tests continuously. When Valgrind finds a flaw, the developer dealing with it either fixes it immediately, or files a bug, then either adds a suppression to one of the valgrind suppression files, or disables the test (in the case of a crash). The goal is to keep the tree green so that any new failures stand out vividly.

The same strategy is followed when running Valgrind on the Windows tests; either fix the problems (if you can), or file bugs and suppress the error or disable the test, such that all the tests pass under Valgrind without any reported warnings. Then work can continue, and the problems recorded in the suppressions files and expected failure list can be fixed offline.

Issue: memcheck unit tests not ported

Valgrind's memcheck has a nice set of unit tests, but it's hard to run them on valgrind+wine to see whether valgrind works as it should there.

The test suite should be ported to Visual C++ and we should use it to find and fix problems in the valgrind+wine combination. If it doesn't have tests for C++, those need to be added.

Issue: detecting overruns

Valgrind replaces the linux heap with one that puts guard bytes before and after each allocation, and marks those guard bytes as inaccessible. At the moment, nothing does this when running win32 apps under wine+valgrind; wine does inform Valgrind about the allocations made by the NT heap, but it doesn't put any guard bytes around them.

So, how can we achieve this on Wine? The Wine philosophy is to look for how Windows likes to do things, and sure enough, there are ways to tell the Windows kernel to do similar things. http://technet.microsoft.com/en-us/library/cc736347(WS.10).aspx describes the gflags utility, used to set registry flags that end up controlling the NtGlobalFlags field of each processes' PEB.

It would probably suffice to implement the FLG_HEAP_ENABLE_TAIL_CHECK bit of NtGlobalFlags in Wine, and add a Valgrind annotation for the guard bytes. This would improve memory error detection in Wine even without Valgrind, too.

Issue: detecting double frees

Valgrind's replacement heap also keeps freed blocks out of circulation for a while, marking them completely inaccessible so that references to freed blocks are caught.

Wine's implementation of the win32 system heap should probably provide this feature, possibly triggered by the FLG_HEAP_DISABLE_COALESCING NtGlobalFlags bit.

Issue: userspace heaps getting in the way

Visual C++'s C runtime library may have its own heap implementation on top of the Win32 system heap. Likewise, applications may override the default ::operator new and malloc with their own heap implementations. All of these can get in the way of the valgrind-aware system heap described above.

Valgrind should probably intercept calls to ::operator new, malloc, and friends, and redirect them straight to the win32 system heap (but see below).

However, it may be sufficient to run Chromium with the system heap selected by setting CHROME_ALLOCATOR=winheap as described in Page Heap for Chrome. That's next on my list of things to try.

Issue: detecting delete/delete[]/free mismatches

On Linux, Valgrind's replacement heap marks each block with a type so mismatches between the various allocation types can be detected.

Valgrind should probably intercept calls to ::operator new, malloc, and friends, and wrap them with a type label before redirecting them to the win32 system heap.

Issue: valgrind.h doesn't compile with Visual C++

win32 apps can't currently include valgrind.h, so they can't use Valgrind's client hooks to inform Valgrind about functions expected to generate exceptions, etc.

valgrind.h should be ported to visual C++; see valgrind bug 210935.

Issue: spurious warnings from _strlen

Library routines that process strings a word at a time seem to generate false warnings like
 Conditional jump or move depends on uninitialised value(s)
    at 0x401086: _strlen (in /home/dank/test2.exe)
Valgrind has custom versions of these functions for Linux.

Valgrind should probably have custom versions of these functions for win32 as well; see valgrind bug 190660. Until then, using the debugging C runtime library may be a sufficient workaround.

Issue: Slowdown from _chkstk?

Win32 apps that have large stack variables issue a large number of calls to _chkstk(), all? of which trigger Valgrind warnings (at least when --check-smc=all is given). (See kb100775.) It may also interfere with --track-origins.

Valgrind could know about _chkstk() at a lower level than it currently does; see John Reiser's note from 2008. User apps

Issue: Incomplete pdb file support in Valgrind

Some small test programs don't get symbolic stack dumps in Valgrind because of incomplete pdb file support.

Valgrind should copy the fix for this issue from Wine. See valgrind bug 211529.