Tags
code, coverage, fossil, fossil-scm, gcov, regression, test, testing
I like and recommend fossil for source code management.
I have been contributing some formal regression testing to the development effort, primarily through writing new test cases beginning with the JSON support feature that did not have any formal regression tests.
I’ve written some tests following a “black box tester” style, and now I want to peek inside the box to see how much of the actual implementation my tests are covering. Fortunately, there are tools to do this. Let’s see how it is done.
Fossil itself is built with a large array of core features exposed through a command line interface (CLI) as well as through web pages served by its built-in web server. As designed, each feature is exercised by a single run of the fossil command. Even the web server (visible through the fossil server
and fossil ui
commands) operates by spawning a sub-process for each page to fetch, which runs the fossil http
command and follows the CGI protocol. (Of course, I’m simplifying a complex topic there, but the complexity I’m ignoring doesn’t matter for this article.)
The formal test suite operates as a collection of Tcl scripts that run fossil
and fossil http
with various commands (many of which are test-
commands only intended for use from the test suite, say fossil help -t
to get a list of them) and then inspect the results.
Either way, the question of how much code is used can only be answered by a process that gathers statistics over many individual runs of the fossil command. It may not exactly be a surprise to know that the existing tools for monitoring execution are fully aware of this, and by default are accumulating statistics from many runs of the program.
Measuring Code Coverage
The question I want to answer is known as “code coverage”. In other words, if I run a sequence of commands, which parts of the source code were actually executed, or “covered”, by that sequence. Regression test suites can be scored by what percentage of the code is covered. Of course, some lines of code are inherently difficult to cover with test cases, such as code that handles rare external conditions, which makes achieving anything close to 100% coverage noteworthy for a program of any significant size and complexity.
Actually measuring coverage can be done in one of two ways. The easy way is to insert instrumentation into the compiled code that increments a count as each line is reached. This is easy to build, but it does require that the application be recompiled in order to insert the instrumentation, and the extra work required to load and store the counters will affect performance. This is easy because the C compiler (GCC) already has a feature that adds instrumentation code, and there is a companion tool (gcov) that analyzes the resulting statistics and creates reports showing how often or if at all each line of code was executed.
The much harder way is to run the unmodified program in a virtual machine that is designed to keep the counters. This is similar to the functionality of tools like Valgrind that use a similar virtual machine technique to track and verify every memory access to verify that a program only uses memory it is permitted to use. This is primarily difficult because the heavy lifting must be done in the simulator, and no one appears to have created a Valgrind plug-in to do coverage. The closest is a tool for analyzing call graphs and their impact on CPU caching. Of course, Valgrind is also linux-only. For Windows, there is a new tool, DynamoRIO, that has a very lightly documented coverage plugin. Their well documented face is DrMemory, which is trying to hit the same niche as Valgrind. I have Valgrind on my Ubuntu box and DrMemory installed on Windows, but have yet investigated the DynamoRIO engine or its coverage tool.
Setting up for coverage
Prerequisite Tools
First, you need to be able to build fossil, which will require a suitable C compiler and the usual suspect collection of programmer’s tools starting with make. On Windows, I’ve used [MinGW][] and MSYS for builds. On linux, you need the usual suite of developer tools. The key feature MinGW supplies is GCC and gcov.
I would like to be able to test on Windows, and am pleasantly surprised to find that the test suite largely works the same on Windows, at least if run from bash
provided via MSYS.
With fossil built, you still need a working installation of Tcl. A full-featured desktop Linux distro likely has Tcl already. If not, sudo apt-get install tcl
will generally do the trick.
In addition to Tcl, the test suite implicitly assumes you have the SQLite integration module installed in your Tcl. This is not typical for a stock distro, so you are likely to need to add it. On Ubuntu, sudo apt-get install libsqlite3-tcl
will solve that. If building fossil itself to use Tcl, but not via the “stubs” mechanism that loads the available Tcl library at run time, you might need to also get the tcl-dev
package.
You will also need a JSON parser for Tcl, json.test
assumes you have a specific one, available via teacup install json
with ActiveTcl on Windows.
Building Out of Tree
The basic recipe is first, with fossil’s repository open in ~/fossil/work
, build out of tree:
$ W=~/fossil/work $ B=~/fossil/build $ mkdir $B; cd $B $ CFLAGS="-O0 -g" $W/configure #with options to taste $ make "TCC=gcc -fprofile-arcs -ftest-coverage"
to get $B/fossil
freshly built. See $W/configure --help
for a list of all the options that are supported. The setting of CFLAGS
while running configure
will cause it to not turn on the optimizer, which makes evaluating coverage much easier, at the cost of additional slowdowns.
For the record, my test build was configured as:
$ CFLAGS="-O0 -g" ../fossil4/configure --with-miniz \ --with-openssl=none --json --with-tcl --with-tcl-stubs \ --with-tcl-private-stubs --with-th1-docs --with-th1-hooks
which gives me all the major features except SSL support.
As usual, in any build folder head -1 config.log
will reveal the exact command line used to configure that build.
Test Out of Tree
You can and should run tests from out of the tree. The tests could be run in a build folder, or entirely separately. But given the way the coverage data gets written, it will be simplest to run the tests in the build folder.
We’ll start with some shell variables that we will assume point to the appropriate folders:
$ W=~/fossil/work $ F=./fossil
With those definitions, the most direct way to run the entire test suite is as follows:
$ tclsh $W/test/tester.tcl $F -prot -quiet -verbose
The normally immense amount of output will all be captured in a file named prot
as a result of the -prot
option. The terminal window will only display failing tests thanks to the -quiet
option. With the -verbose
option, prot
will contain a record of every command line for fossil
, all of the output from each command, and lots of details related to the test cases.
With code coverage turned on, the test suite will take a long time to run. You might want to start by running just a few smaller test cases just to get a sense of what it will be like.
Since I am primarily concerned about JSON support, I’ll start with just running that test package:
$ tclsh $W/test/tester.tcl $F json -prot -quiet -verbose
Process the statistics
The gcov
command has a lot of options, most of which help point it at the raw data and source code it needs to produce its principle report: annotated source code with candidate executable source lines identified and displaying counts for the number of times each line has executed. As a handy feature for finding uncovered lines, it flags those with “#####” rather than a count of 0 in the annotation. Gcov also produces a summary report that shows the percent of lines covered in each file it processed, as well as over all the files it processed.
A command like this:
$ rm *.gcov $ gcov bld/*.gcda
will run gcov over all the captured counter data, and write a large number of annotated source files to the current directory. As written, it assumes that the current directory is the root of the out-of-tree build, the same folder where ../fossil/configure
was run, and where the instrumented fossil.exe
was placed.
This requirement is because the .gcda
files contain paths to the related source code relative to the build folder. Command line options and environment variables can be used to override the saved paths, but it is much simpler to run from the correct folder.
Results
JSON Feature Only
For just the JSON tests, and analyzing just the explicitly JSON related source files with gcov bld/[jc]*.gcda
, I see the following summary:
$ tclsh $W/test/tester.tcl $F -prot -quiet -verbose .... $ gcov bld/[jc]son*.gcda File '../fossil4/src/cson_amalgamation.c' Lines executed:54.58% of 2072 Creating 'cson_amalgamation.c.gcov' File '../fossil4/src/json.c' Lines executed:81.10% of 947 Creating 'json.c.gcov' File 'c:/programs/mingw/include/stdio.h' Lines executed:50.00% of 10 Creating 'stdio.h.gcov' File '../fossil4/src/json_artifact.c' Lines executed:65.47% of 223 Creating 'json_artifact.c.gcov' File '../fossil4/src/json_branch.c' Lines executed:75.00% of 172 Creating 'json_branch.c.gcov' File '../fossil4/src/json_config.c' Lines executed:84.31% of 51 Creating 'json_config.c.gcov' File '../fossil4/src/json_diff.c' Lines executed:76.79% of 56 Creating 'json_diff.c.gcov' File '../fossil4/src/json_dir.c' Lines executed:56.41% of 117 Creating 'json_dir.c.gcov' File '../fossil4/src/json_finfo.c' Lines executed:71.64% of 67 Creating 'json_finfo.c.gcov' File '../fossil4/src/json_login.c' Lines executed:64.65% of 99 Creating 'json_login.c.gcov' File '../fossil4/src/json_query.c' Lines executed:57.14% of 28 Creating 'json_query.c.gcov' File '../fossil4/src/json_report.c' Lines executed:64.42% of 104 Creating 'json_report.c.gcov' File '../fossil4/src/json_status.c' Lines executed:41.18% of 68 Creating 'json_status.c.gcov' File '../fossil4/src/json_tag.c' Lines executed:58.64% of 220 Creating 'json_tag.c.gcov' File '../fossil4/src/json_timeline.c' Lines executed:57.19% of 278 Creating 'json_timeline.c.gcov' File '../fossil4/src/json_user.c' Lines executed:67.48% of 163 Creating 'json_user.c.gcov' File '../fossil4/src/json_wiki.c' Lines executed:67.41% of 270 Creating 'json_wiki.c.gcov' Lines executed:63.38% of 4945 $
The bottom line is that my JSON test cases are covering about 63% of the code implementing the feature.
I’m slightly surprised given that my tests so far have not attempted to exercise all the edge cases, that the coverage is so high. But clearly there is room to improve.
The Full Banana
The next trick is to let the entire test case run, which I suspect will take all night. I’ll run it sandwiched between two calls to date
to find out.
$ date ; tclsh $W/test/tester.tcl $F -prot -quiet -verbose ; date Wed Feb 3 20:10:06 PST 2016 test json-cap-POSTenv-name FAILED (knownBug)! test json-wiki-diff-diff FAILED (knownBug)! test json-ROrepo-2-2 FAILED (knownBug)! test json-ROrepo-2-3 FAILED (knownBug)! ERROR: ADDED f3 ADDED f4 DELETE f2 "fossil undo" is available to undo changes to the working checkout. WARNING: local edits lost for f2 WARNING: 1 merge conflicts test merge_multi-4 FAILED (knownBug)! UPDATE f1 "fossil undo" is available to undo changes to the working checkout. UPDATE f1 "fossil undo" is available to undo changes to the working checkout. UPDATE f1 "fossil undo" is available to undo changes to the working checkout. test merge_renames-5 FAILED (knownBug)! test th1-tcl-9 FAILED! RESULT: fossil.exe 3 {test-th-render --open-config {C:\Users\Ross\Documents\tmp\ fossil4\test\th1-tcl9.txt}} ***** Final results: 1 errors out of 31719 tests ***** Considered failures: th1-tcl-9 ***** Ignored results: 6 ignored errors out of 31719 tests ***** Ignored failures: json-cap-POSTenv-name json-wiki-diff-diff json-ROrepo-2- 2 json-ROrepo-2-3 merge_multi-4 merge_renames-5 Thu Feb 4 04:12:11 PST 2016
Interesting. It ran for 8 hours, 2 minutes, 5 seconds. (The full suite on this PC running a copy of fossil with the usual -O2
optimization level and lacking the instrumentation completes the same suite in under 15 minutes. I really should time that as a point of reference, I’m basing that estimate on the fact that I’ve run it multiple times and expect it to take a while, but not long enough to go get lunch.)
Also interesting to note for further investigation is the unexpected failure of the test identified as th1-tcl-9
. That test was not failing with fossil configured normally.
A quick repeat of the previous gcov
command on only the core JSON code shows that the bottom line is that running the whole suite did not add any additional heat on the JSON-only code: it still can be summarized as “Lines executed:63.38% of 4945”. This is not at all surprising since I know that until I wrote json.test
there were no test cases that deliberately exercising the JSON feature.
$ gcov bld/*.gcda File '../fossil4/src/add.c' Lines executed:83.70% of 319 Creating 'add.c.gcov' .... File '../fossil4/src/bisect.c' Lines executed:0.00% of 218 Creating 'bisect.c.gcov' .... File '../fossil4/src/lookslike.c' Lines executed:93.90% of 164 Creating 'lookslike.c.gcov' .... File '../fossil4/src/sqlite3.c' Lines executed:40.68% of 46270 Creating 'sqlite3.c.gcov' .... File '../fossil4/src/zip.c' Lines executed:0.00% of 261 Creating 'zip.c.gcov' Lines executed:38.12% of 91494
I’ve elided most of the 126 individual file reports, leaving behind just a few to show typical values and the overall range of coverage. I’ll undoubtedly look at this in more detail, with an eye to what sorts of additional test cases should be written to improve the coverage of the whole.
Conclusions
For a first crack at testing the JSON features, the current version of json.test
does surprisingly well at covering the related code.
For the whole of fossil, however, there is clearly room for improvement.
SQLite itself comprises fully half of the executable lines of code in fossil. It has a separate test suite that has achieve a very high coverage. It should likely be eliminated from these statistics.
There are a number of fossil features that have zero or near zero coverage. These are low-hanging fruit and should get test cases.
Turning off optimizations and turning on coverage instrumentation caused a test case to fail that had previously passed. That should be investigated, although it is likely to be a false positive of some form.
Version Tested
These results are from my rberteig-json-test
branch, and based on checkin 9f45c8b6e0. While run under Windows (via MSYS bash) I have no reason to suspect the results would be significantly different under Linux. Perhaps I’ll fire my Ubuntu VM back up and find out.
If you have a project involving embedded systems, micro-controllers, electronics design, audio, video, or more we can help. Check out our main site and call or email us with your needs. No project is too small!
+1 626 303-1602
Cheshire Engineering Corp.
710 S Myrtle Ave, #315
Monrovia, CA 91016
(Written with StackEdit.)