Tags

configuration, cron, crontab, fossil, hosting, web

I’ve mentioned before that Cheshire likes to use Fossil for version control. Most of our need is for internal use only, but occasionally it is handy to have a server under our control that hosts a repository.

After a recent move to hosting at DreamHost, I decided to find out how difficult it would be to host a fossil repository on their server. In short, it doesn’t seem to be difficult at all.

Background

Fossil is a light-weight but surprisingly powerful Software Configuration Mangement (SCM) system.

Perhaps its most attractive feature to smaller developer teams is its low ceremony philosophy. Attractive to me is its emphasis on not forgetting anything, even the things you wish it would forget.

Fossil itself is a single compiled executable, with very light dependencies on third-party libraries, so installation often involves little more than placing the executable in a suitable folder.

Creating and managing a repository for a single developer involves learning only two new commands (fossil new and fossil open) beyond the two or three that are needed for normal work. (The most common commands I use are fossil checkin, fossil update, and fossil status. Others that are useful include fossil ui, fossil changes, fossil extra, fossil set, and fossil bisect.)

The built-in GUI operates as a web server, and provides lots of capability for exploration of the timeline of changes, as well as a complete trouble ticket system and a documentation wiki.

Extending from a single copy of a private repository to a distributed system where copies are located on remote machines is straightforward, and largely requires only the use of the fossil clone command to get a second copy, and a way to place that copy on a second machine where it can be accessed by fossil sync.

My goal with this article is to show how to use hosting at DreamHost as that second machine.

DreamHost

DreamHost is one of many, many players in the space of shared web hosting. Their services extend upward to overlap with cloud computing providers and in many ways with rack space and colocation services as well.

Even their entry level shared hosting accounts come with FTP and SSH access to a server running linux on a 64 bit platform. Unlike many, they provide GCC for compilation, and allow custom CGI services to be written and used. There are some reasonable limits on CPU and memory consumption, but fossil itself is rather light weight in both spaces.

There isn’t a precompiled binary of fossil that will run on their server, but a tarball ready to build from is only one wget command away.

$ mkdir downloads; cd downloads
$ wget https://www.fossil-scm.org/download/fossil-src-1.35.tar.gz
$ cd ..; mkdir build; cd build
$ tar xf ../downloads/fossil-src-1.35.tar.gz

Then building it is as simple as the usual ./configure and make recipe:

$ cd fossil-1.35
$ ./configure; make
....
$

After this churns along for a while, it should finish with no errors, and leave fossil freshly built in the build/fossil-1.35 folder. To confirm that it runs at all, ask it for its version details:

$ ./fossil version -v
This is fossil version 1.35 [3aa86af6aa] 2016-06-14 11:10:39 UTC
Compiled on Jun 24 2016 18:20:54 using gcc-4.6.3 (64-bit)
SQLite 3.13.0 2016-05-18 10:57:30 fc49f556e4
Schema version 2015-01-24
zlib 1.2.3.4, loaded 1.2.3.4
SSL (OpenSSL 1.0.1 14 Mar 2012)
UNICODE_COMMAND_LINE
DYNAMIC_BUILD
$

I don’t have root access, so I didn’t attempt make install. But since installation is not exactly required as long as fossil can be found, that isn’t an issue. For convenience I did make a copy of fossil in ~/bin/fossil. If I expect to do more at the command line with it, I’d be tempted to put ~/bin on my PATH but I haven’t bothered to do that yet.

Configuring for CGI access

CGI at DreamHost is easy to configure, with nearly any folder inside your site’s web folder tree able to serve executable content if desired. As configured out of the box, any folder will handle executable files that it recognizes as CGI scripts. You can also modify that through .htaccess files.

I’ve created a folder named ~/publicrepo which will hold the repositories that will be accessible. To serve these repositories from my fully hosted domain, I set up a simple fossil as CGI script named repo.cgi:

#!/home/XXX/bin/fossil
directory: /home/XXX/publicrepo
notfound: /repo.cgi/
repolist

This is an ordinary CGI script, written using the #! notation to invoke fossil to parse and execute it. The available clauses are not entirely documented beyond the directory: and notfound: used here. Specifically, repolist is not documented outside of the source code. With that clause in the file, then an URL that ends in a / but names no repository will produce a list of all the *.fossil files in the named directory. The notfound: clause will redirect any mistyped repository names back to the list as well.

This is an easy alternative to maintaining such a list by hand, but do note the small security implication that every served repository is linked and easily discovered.

With the script as shown, the repository file named ~/publicrepo/example.fossil will be accessible from my hosted site as http://example.com/repo.cgi/example, and any URL that does not match a repository will produce the directory listing.

Polite Robots Should Look Away

Just in case a link to our repositories gets into the wild (as it will when this post is published, for instance) we should likely suggest to robots that they not index the content of our fossil repositories. Your mileage may vary, and there may even be good reasons to allow some indexing.

Also, fossil itself has some built-in defenses against crawlers that attempt to request ZIPs and tarballs of every revision, something that is certainly wasteful of our server’s resources and likely of very little value to an index.

To head the robots and web crawlers off at the pass, we’ll be sure to include a reference in our demo site’s robots.txt file:

User-agent: *
Disallow: /repo.cgi/

The list is short and sweet because we don’t have very much to say other than any polite robot will stay off of our repositories.

A Sample Repository

None of this is useful without a repository sitting there to serve. This article will stop short of providing tools to automate the creation of new repositories solely from a web interface. This can be done, as is amply demonstrated by Chiselapp, and might be the subject of a future post.

So at a server prompt, we create a new repository to share:

[aardwolf]$ bin/fossil new -A Ross publicrepo/example.fossil
project-id: bca537e4ad6be3ffc8dd8f956a10507a8a93fdac
server-id:  47f597979c1779d43e20c112849829e6f34025db
admin-user: Ross (initial password is "0c34b8")
[aardwolf]$ bin/fossil user -R publicrepo/example.fossil default Ross
[aardwolf]$

And now, browsing to /repo.cgi/example opens the web interface to that repository. Notice that it is not configured at all, you will want to log in, then visit Admin -> Configuration to set the project name, and Admin -> Users to change the password of the new admin-user, and likely most of the rest of the configuration.

Note that I used the -A option to fossil new to name the admin-user, and that I also use fossil user default to set that user as the repository’s default user. I did this because the user name associated with your DreamHost hosting account would be leaked in the published repository otherwise. On the principle that it is better to not leak that user name (which DreamHost by default keeps fairly private) that seemed like the right choice. As of fossil version 1.35, it appears that the mechanism for figuring out the default user will not be able to guess right if the fossil user default command is not used to store it in the repository explicitly.

Cloning

Cloning an existing repository that is accessible from the internet is easy. Log in to your DH server via SSH, then issue the fossil clone command:

[aardwolf]$ bin/fossil clone -A Ross http://rberteig@chiselapp.com/user/rberteig
/repository/dotfonter/ publicrepo/dotfonter.fossil
password for rberteig:
remember password (Y/n)? Y
Round-trips: 2   Artifacts sent: 0  received: 110
Clone done, sent: 604  received: 79085  ip: 2607:f1c0:836:fc00:ed9f:9a2e:8aca:ee4d
Rebuilding repository meta-data...
  100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: a3298980f9dcd4631af35e0e3ce0e3d571fbd078
server-id:  fd96e93a776434b7616eff76270e3d7698c18a8b
admin-user: Ross (password is "43d2fb")
[aardwolf]$ bin/fossil user -R publicrepo/dotfonter.fossil default Ross
[aardwolf]$

Here I cloned a repository hosted at Chiselapp and used a username and password I have so that the clone has the ability to push changes back to the copy at Chiselapp. We’ll use that in the next session.

You don’t have to push at all (useful if cloning something like the official fossil sources where you likely don’t have permission) or even sync at all, of course.

Configuring timely autosync

For some projects, simply having a copy outside the building is sufficient. Developer changes will get pushed to the exterior copy as long as fossil remote-url points to the external copy, and fossil set autosync is on. Multiple developers can share that single copy, using fossil update to keep their local copy and work space up to date.

But sometimes you want to keep more than one external copy, or to mirror a project from elsewhere. To do that best, you need the exterior copy to also synchronize with one (or more, but that is more advanced) other repositories hosted even further away.

Since DreamHost also provides access to cron jobs, this is a simple feature to accomplish.

The easy way is to arrange to run fossil all sync occasionally from your hosting account user, and that is exactly the kind of thing a cron job is designed to do. The easiest way to get the arcane syntax of a cron job right is to let the DH control panel do the work for you.

Log in to your DH control panel, then find “Cron Jobs” in the left menu bar under “Goodies”. Click “Add a Cron Job”, then fill out the form.

If you have multiple hosted sites using separate users (the recommended way to do it) be sure to pick the right user from the list of users. Note that only “shell users” are listed. You had to have a shell user to get this far, of course.

Type something friendly in the Title box. This will show in the control panel’s list of cron jobs.

The command I use is ~/bin/fossil all sync >sync.log 2>&1 which keeps a copy of the most recent output from the sync, overwriting it each time the job runs. There are other ways to do it if you want to be more careful about logging, or get errors by email, or more. That is really the topic of a lifetime of playing with shell scripts, and won’t fit into this paragraph.

The only other option I changed in the form is to set “When to run” to “Custom”, then pick “Every 10 Minutes” from the Minutes box. That is the easiest way to get a job that runs every 10, minutes every day, all day.

You can verify that your job is running by looking for the file sync.log to appear, and check its content to make sure it contains no errors.

For the two repositories listed in this post, the output will show some transactions with Chiselapp, and simply list example.fossil as synced because it has no remote URL to sync with.

Note that this technique depends on fossil’s internally maintained list of repositories. The advantage of this is that list will be extended as you add new repositories by cloning or by syncing them manually the first time. The disadvantage is that the list itself is difficult to observe from outside the command prompt. The list of repositories known to fossil all can be seen with the command fossil all ls.

[aardwolf]$ ~/bin/fossil all ls
/home/XXX/publicrepo/dotfonter.fossil
/home/XXX/publicrepo/example.fossil
[aardwolf]$

It might be wise to use a shell script that explicitly lists the repositories to be synced as a sequence of fossil sync -R ... commands instead.

Summary

I like and use fossil, and wanted to try hosting a repository at DreamHost. This article describes what I did to try that experiment.

You can see these repositories in action at our demo site.

Watch for more posts about parking fossil on DreamHost as I find new tricks to talk about.

If you have a project involving embedded systems, micro-controllers, electronics design, audio, video, or more we can help. Check out our main site and call or email us with your needs. No project is too small!

+1 626 303-1602
Cheshire Engineering Corp.
710 S Myrtle Ave, #315
Monrovia, CA 91016

(Written with StackEdit.)

Words from Cheshire Engineering Corp.

~ Things we talk about

Fun with DreamHost: Fossil

Background

DreamHost

Configuring for CGI access

Polite Robots Should Look Away

A Sample Repository

Cloning

Configuring timely autosync

Summary

Leave a comment

Background

DreamHost

Configuring for CGI access

Polite Robots Should Look Away

A Sample Repository

Cloning

Configuring timely autosync

Summary

Share this:

Related

Leave a comment