Tags

code, coverage, fossil, fossil-scm, gcov, regression, test, testing

I like and recommend fossil for source code management.

I have been contributing some formal regression testing to the development effort, primarily through writing new test cases beginning with the JSON support feature that did not have any formal regression tests.

I’ve written some tests following a “black box tester” style, and now I want to peek inside the box to see how much of the actual implementation my tests are covering. Fortunately, there are tools to do this. Let’s see how it is done.

Fossil itself is built with a large array of core features exposed through a command line interface (CLI) as well as through web pages served by its built-in web server. As designed, each feature is exercised by a single run of the fossil command. Even the web server (visible through the fossil server and fossil ui commands) operates by spawning a sub-process for each page to fetch, which runs the fossil http command and follows the CGI protocol. (Of course, I’m simplifying a complex topic there, but the complexity I’m ignoring doesn’t matter for this article.)

The formal test suite operates as a collection of Tcl scripts that run fossil and fossil http with various commands (many of which are test- commands only intended for use from the test suite, say fossil help -t to get a list of them) and then inspect the results.

Either way, the question of how much code is used can only be answered by a process that gathers statistics over many individual runs of the fossil command. It may not exactly be a surprise to know that the existing tools for monitoring execution are fully aware of this, and by default are accumulating statistics from many runs of the program.

Measuring Code Coverage

The question I want to answer is known as “code coverage”. In other words, if I run a sequence of commands, which parts of the source code were actually executed, or “covered”, by that sequence. Regression test suites can be scored by what percentage of the code is covered. Of course, some lines of code are inherently difficult to cover with test cases, such as code that handles rare external conditions, which makes achieving anything close to 100% coverage noteworthy for a program of any significant size and complexity.

Actually measuring coverage can be done in one of two ways. The easy way is to insert instrumentation into the compiled code that increments a count as each line is reached. This is easy to build, but it does require that the application be recompiled in order to insert the instrumentation, and the extra work required to load and store the counters will affect performance. This is easy because the C compiler (GCC) already has a feature that adds instrumentation code, and there is a companion tool (gcov) that analyzes the resulting statistics and creates reports showing how often or if at all each line of code was executed.

The much harder way is to run the unmodified program in a virtual machine that is designed to keep the counters. This is similar to the functionality of tools like Valgrind that use a similar virtual machine technique to track and verify every memory access to verify that a program only uses memory it is permitted to use. This is primarily difficult because the heavy lifting must be done in the simulator, and no one appears to have created a Valgrind plug-in to do coverage. The closest is a tool for analyzing call graphs and their impact on CPU caching. Of course, Valgrind is also linux-only. For Windows, there is a new tool, DynamoRIO, that has a very lightly documented coverage plugin. Their well documented face is DrMemory, which is trying to hit the same niche as Valgrind. I have Valgrind on my Ubuntu box and DrMemory installed on Windows, but have yet investigated the DynamoRIO engine or its coverage tool.

Setting up for coverage

Prerequisite Tools

First, you need to be able to build fossil, which will require a suitable C compiler and the usual suspect collection of programmer’s tools starting with make. On Windows, I’ve used [MinGW][] and MSYS for builds. On linux, you need the usual suite of developer tools. The key feature MinGW supplies is GCC and gcov.

I would like to be able to test on Windows, and am pleasantly surprised to find that the test suite largely works the same on Windows, at least if run from bash provided via MSYS.

With fossil built, you still need a working installation of Tcl. A full-featured desktop Linux distro likely has Tcl already. If not, sudo apt-get install tcl will generally do the trick.

In addition to Tcl, the test suite implicitly assumes you have the SQLite integration module installed in your Tcl. This is not typical for a stock distro, so you are likely to need to add it. On Ubuntu, sudo apt-get install libsqlite3-tcl will solve that. If building fossil itself to use Tcl, but not via the “stubs” mechanism that loads the available Tcl library at run time, you might need to also get the tcl-dev package.

You will also need a JSON parser for Tcl, json.test assumes you have a specific one, available via teacup install json with ActiveTcl on Windows.

Building Out of Tree

The basic recipe is first, with fossil’s repository open in ~/fossil/work, build out of tree:

$ W=~/fossil/work
$ B=~/fossil/build
$ mkdir $B; cd $B
$ CFLAGS="-O0 -g" $W/configure   #with options to taste
$ make "TCC=gcc -fprofile-arcs -ftest-coverage"

to get $B/fossil freshly built. See $W/configure --help for a list of all the options that are supported. The setting of CFLAGS while running configure will cause it to not turn on the optimizer, which makes evaluating coverage much easier, at the cost of additional slowdowns.

For the record, my test build was configured as:

$ CFLAGS="-O0 -g" ../fossil4/configure --with-miniz \
  --with-openssl=none --json --with-tcl --with-tcl-stubs \
  --with-tcl-private-stubs --with-th1-docs --with-th1-hooks

which gives me all the major features except SSL support.

As usual, in any build folder head -1 config.log will reveal the exact command line used to configure that build.

Test Out of Tree

You can and should run tests from out of the tree. The tests could be run in a build folder, or entirely separately. But given the way the coverage data gets written, it will be simplest to run the tests in the build folder.

We’ll start with some shell variables that we will assume point to the appropriate folders:

$ W=~/fossil/work
$ F=./fossil

With those definitions, the most direct way to run the entire test suite is as follows:

$ tclsh $W/test/tester.tcl $F -prot -quiet -verbose

The normally immense amount of output will all be captured in a file named prot as a result of the -prot option. The terminal window will only display failing tests thanks to the -quiet option. With the -verbose option, prot will contain a record of every command line for fossil, all of the output from each command, and lots of details related to the test cases.

With code coverage turned on, the test suite will take a long time to run. You might want to start by running just a few smaller test cases just to get a sense of what it will be like.

Since I am primarily concerned about JSON support, I’ll start with just running that test package:

$ tclsh $W/test/tester.tcl $F json -prot -quiet -verbose

Process the statistics

The gcov command has a lot of options, most of which help point it at the raw data and source code it needs to produce its principle report: annotated source code with candidate executable source lines identified and displaying counts for the number of times each line has executed. As a handy feature for finding uncovered lines, it flags those with “#####” rather than a count of 0 in the annotation. Gcov also produces a summary report that shows the percent of lines covered in each file it processed, as well as over all the files it processed.

A command like this:

$ rm *.gcov
$ gcov bld/*.gcda

will run gcov over all the captured counter data, and write a large number of annotated source files to the current directory. As written, it assumes that the current directory is the root of the out-of-tree build, the same folder where ../fossil/configure was run, and where the instrumented fossil.exe was placed.

This requirement is because the .gcda files contain paths to the related source code relative to the build folder. Command line options and environment variables can be used to override the saved paths, but it is much simpler to run from the correct folder.

Results

JSON Feature Only

For just the JSON tests, and analyzing just the explicitly JSON related source files with gcov bld/[jc]*.gcda, I see the following summary:

$ tclsh $W/test/tester.tcl $F -prot -quiet -verbose
....
$ gcov bld/[jc]son*.gcda
File '../fossil4/src/cson_amalgamation.c'
Lines executed:54.58% of 2072
Creating 'cson_amalgamation.c.gcov'

File '../fossil4/src/json.c'
Lines executed:81.10% of 947
Creating 'json.c.gcov'

File 'c:/programs/mingw/include/stdio.h'
Lines executed:50.00% of 10
Creating 'stdio.h.gcov'

File '../fossil4/src/json_artifact.c'
Lines executed:65.47% of 223
Creating 'json_artifact.c.gcov'

File '../fossil4/src/json_branch.c'
Lines executed:75.00% of 172
Creating 'json_branch.c.gcov'

File '../fossil4/src/json_config.c'
Lines executed:84.31% of 51
Creating 'json_config.c.gcov'

File '../fossil4/src/json_diff.c'
Lines executed:76.79% of 56
Creating 'json_diff.c.gcov'

File '../fossil4/src/json_dir.c'
Lines executed:56.41% of 117
Creating 'json_dir.c.gcov'

File '../fossil4/src/json_finfo.c'
Lines executed:71.64% of 67
Creating 'json_finfo.c.gcov'

File '../fossil4/src/json_login.c'
Lines executed:64.65% of 99
Creating 'json_login.c.gcov'

File '../fossil4/src/json_query.c'
Lines executed:57.14% of 28
Creating 'json_query.c.gcov'

File '../fossil4/src/json_report.c'
Lines executed:64.42% of 104
Creating 'json_report.c.gcov'

File '../fossil4/src/json_status.c'
Lines executed:41.18% of 68
Creating 'json_status.c.gcov'

File '../fossil4/src/json_tag.c'
Lines executed:58.64% of 220
Creating 'json_tag.c.gcov'

File '../fossil4/src/json_timeline.c'
Lines executed:57.19% of 278
Creating 'json_timeline.c.gcov'

File '../fossil4/src/json_user.c'
Lines executed:67.48% of 163
Creating 'json_user.c.gcov'

File '../fossil4/src/json_wiki.c'
Lines executed:67.41% of 270
Creating 'json_wiki.c.gcov'

Lines executed:63.38% of 4945
$

The bottom line is that my JSON test cases are covering about 63% of the code implementing the feature.

I’m slightly surprised given that my tests so far have not attempted to exercise all the edge cases, that the coverage is so high. But clearly there is room to improve.

The Full Banana

The next trick is to let the entire test case run, which I suspect will take all night. I’ll run it sandwiched between two calls to date to find out.

$ date ; tclsh $W/test/tester.tcl $F  -prot -quiet -verbose ; date

Wed Feb  3 20:10:06 PST 2016
test json-cap-POSTenv-name FAILED (knownBug)!
test json-wiki-diff-diff FAILED (knownBug)!
test json-ROrepo-2-2 FAILED (knownBug)!
test json-ROrepo-2-3 FAILED (knownBug)!
ERROR: ADDED f3
ADDED f4
DELETE f2
 "fossil undo" is available to undo changes to the working checkout.
WARNING: local edits lost for f2

WARNING: 1 merge conflicts
test merge_multi-4 FAILED (knownBug)!
UPDATE f1
 "fossil undo" is available to undo changes to the working checkout.
UPDATE f1
 "fossil undo" is available to undo changes to the working checkout.
UPDATE f1
 "fossil undo" is available to undo changes to the working checkout.
test merge_renames-5 FAILED (knownBug)!
test th1-tcl-9 FAILED!
RESULT: fossil.exe 3 {test-th-render --open-config {C:\Users\Ross\Documents\tmp\
fossil4\test\th1-tcl9.txt}}

***** Final results: 1 errors out of 31719 tests
***** Considered failures: th1-tcl-9
***** Ignored results: 6 ignored errors out of 31719 tests
***** Ignored failures: json-cap-POSTenv-name json-wiki-diff-diff json-ROrepo-2-
2 json-ROrepo-2-3 merge_multi-4 merge_renames-5
Thu Feb  4 04:12:11 PST 2016

Interesting. It ran for 8 hours, 2 minutes, 5 seconds. (The full suite on this PC running a copy of fossil with the usual -O2 optimization level and lacking the instrumentation completes the same suite in under 15 minutes. I really should time that as a point of reference, I’m basing that estimate on the fact that I’ve run it multiple times and expect it to take a while, but not long enough to go get lunch.)

Also interesting to note for further investigation is the unexpected failure of the test identified as th1-tcl-9. That test was not failing with fossil configured normally.

A quick repeat of the previous gcov command on only the core JSON code shows that the bottom line is that running the whole suite did not add any additional heat on the JSON-only code: it still can be summarized as “Lines executed:63.38% of 4945”. This is not at all surprising since I know that until I wrote json.test there were no test cases that deliberately exercising the JSON feature.

$ gcov bld/*.gcda 
File '../fossil4/src/add.c'
Lines executed:83.70% of 319
Creating 'add.c.gcov'
....
File '../fossil4/src/bisect.c'
Lines executed:0.00% of 218
Creating 'bisect.c.gcov'
....
File '../fossil4/src/lookslike.c'
Lines executed:93.90% of 164
Creating 'lookslike.c.gcov'
....
File '../fossil4/src/sqlite3.c'
Lines executed:40.68% of 46270
Creating 'sqlite3.c.gcov'
....
File '../fossil4/src/zip.c'
Lines executed:0.00% of 261
Creating 'zip.c.gcov'

Lines executed:38.12% of 91494

I’ve elided most of the 126 individual file reports, leaving behind just a few to show typical values and the overall range of coverage. I’ll undoubtedly look at this in more detail, with an eye to what sorts of additional test cases should be written to improve the coverage of the whole.

Conclusions

For a first crack at testing the JSON features, the current version of json.test does surprisingly well at covering the related code.

For the whole of fossil, however, there is clearly room for improvement.

SQLite itself comprises fully half of the executable lines of code in fossil. It has a separate test suite that has achieve a very high coverage. It should likely be eliminated from these statistics.

There are a number of fossil features that have zero or near zero coverage. These are low-hanging fruit and should get test cases.

Turning off optimizations and turning on coverage instrumentation caused a test case to fail that had previously passed. That should be investigated, although it is likely to be a false positive of some form.

Version Tested

These results are from my rberteig-json-test branch, and based on checkin 9f45c8b6e0. While run under Windows (via MSYS bash) I have no reason to suspect the results would be significantly different under Linux. Perhaps I’ll fire my Ubuntu VM back up and find out.

If you have a project involving embedded systems, micro-controllers, electronics design, audio, video, or more we can help. Check out our main site and call or email us with your needs. No project is too small!

+1 626 303-1602
Cheshire Engineering Corp.
710 S Myrtle Ave, #315
Monrovia, CA 91016

(Written with StackEdit.)

Words from Cheshire Engineering Corp.

~ Things we talk about

Testing Fossil: Code Coverage

Measuring Code Coverage

Setting up for coverage

Prerequisite Tools

Building Out of Tree

Test Out of Tree

Process the statistics

Results

JSON Feature Only

The Full Banana

Conclusions

Version Tested

Leave a comment

Measuring Code Coverage

Setting up for coverage

Prerequisite Tools

Building Out of Tree

Test Out of Tree

Process the statistics

Results

JSON Feature Only

The Full Banana

Conclusions

Version Tested

Share this:

Related

Leave a comment