Friday, January 2, 2009

Notes from Underground, Part 1

For those following d-devel, you may notice that I've recently been working on improving one of the cornerstones of Debian infrastructure; the Debian Archive Kit, or dak for short. Most DDs and DMs don't notice dak exists expect when trying to determine why their latest upload was rejected, and then yelling at the powers that be. I'm here to shead some light on this mythicial beast.

First off, a quick history lesson:

dak (also known as projectb) is a replacement for Debian's original archive software, known simply as dinstall. dinstall itself was a fairly large perl script that does what dak process-unchecked/process-accepted does today. James Troup did a fairly decent summary of dinstall, and its issues

James Troup's README.new-incoming (from dak's git repo):

The old system:
---------------
o incoming was a world writable directory

o incoming was available to everyone through http://incoming.debian.org/

o incoming was processed once a day by dinstall

o uploads in incoming had to have been there > 24 hours before they
were REJECTed. If they were processed before that and had
problems they were SKIPped (with no notification to the maintainer
and/or uploader).

dak's first commits were in 2000, and rolled out onto ftp-master.d.o sometime in 2001 or 2002 (I can't find an exact date for this). Since then, dak is also used on security.d.o, and on backports.org (fun fact for bpo people; the dak installation there is now up to date, and tracking git's tip).

So now that you know the history lesson, what specificially does dak do is the next question. Simply put, dak is the glue that binds the rest of the Debian's backends together; both britney and wanna-build/buildd depend on it. It handles management of uploads to the archive, handles stable release updates, as so forth. It is also the only Debian archive software that uses an actual database backend, and scales fairly well handling over 10,000 packages, and 12 architectures. Unfortunately, there are also a lot of issues with dak as it stands.

Sections of the code base have bitrotted over the years; legacy and legacy-mixed support have died, the import-archive function is shot (more so now than ever, see below), the test suite is non-functional (never a good sign), the docs are out of date, and in many places non-existant, doing a release (both point and full) requires editing the database and so forth.

In addition, dak, while written in python, is written in a fairly procedural style, and and some very ugly code in some places. For instance, the original Debian Maintainer code was handled by having the uid's in the database prefixed by dm: vs having a flag somewhere, and had some hardcoded variables like checking for "unstable", as well as quite a few bugs which caused interesting behavior when uploading to a non-unstable suite such as experimental or one of the proposed queues. (for those of curious, I recommend checking the dak git tree to see what the old DM code looked like, and then aside from the design, find the two major bugs which caused a lot of the weirdness with DMs). It should be stated that the last merge from redid the DM code and design sanely using the new update framework.

These issues have lead to the genesis of the dak v2 project, which is an attempt to replace dak with a module, rewritten from the ground up to be more secure and modular, although its not gotten very far as of writing. I personally don't believe that the current iteration of dak is so bad as scrapping and rewritting is necessary. Instead, I've been working to implement v2 features in dak by aggressive refactoring and cleanup, with the hope of negating the need for a rewrite.

So now thats out of the way, I bet you probably are interested in my .plan for dak. Well, lets go over I've implemented so far.

* An update database framework for dak, which will allow for easy database upgrade and migration, vs the "does it work yet?" approach to applying schema updates. Simply type dak update-db, and your done!

* 822 formatted output for queues (http://ftp-master.debian.org/new.822); this information is now used on DDPO pages

* Rewriting DM management code to have more of a brain than the previous implementation.

What's next on the TODO list

* Content file generation from the database (part of removal of apt-ftparchive, but thats another blog post ;-)).

Oh, as a side note to my current readers, my blog has changed names to "Notes from Underground", after one of my favorite novels, and futher in reference to exploring the mysterious underground that is Debian's backend code. We're also now on Planet Debian :-).

1 comment:

kevix said...

I made this years ago, so it might not be up-to-date. http://mysite.verizon.net/kevin.mark/newdebian2.png