Archive for the ‘misc’ Category.

Paddle a kayak on Wednesday nights

This is fun and it’s good for you: Every Wednesday night at Bay Creek Padding Center on Irondequoit Bay, paddle a kayak around a 2-mile course, get timed, set a new personal record, eat a hot dog.

Newsletter cartoons
has a pretty good selection of cartoon suitable for business presentations. You can
browse by category; see, for example,
security cartoons.
The artist, Ted Goff, licenses his work at various rates that depend on whether the use is for a presentation, newsletter, magazine, etc.

Exposing Digital Forgeries by Detecting Duplicated Image Regions

Dartmouth TR2004-515:

We describe an efficient technique that automatically detects duplicated regions in a digital image. This technique works by first applying a principal component analysis to small fixed-size image blocks to yield a reduced dimension representation. This representation is robust to minor variations in the image due to additive noise or lossy compression. Duplicated regions are then detected by lexicographically sorting all of the image blocks. We show the efficacy of this technique on credible forgeries, and quantify its robustness and sensitivity to additive noise and lossy JPEG compression.

[via Simson Garfinkel]

A neat hack

A colleague of mine discovered that a scripting error had caused a few months of his Apache access logs (compressed with gzip) to get transferred in FTP ASCII mode before being archived to DVD. He asked whether there was any hope for recovery.

Those FTP transfers corrupted about 0.4% of the input bytes. Because every bit counts in a compressed file, these errors send the gzip/inflate decompressor “into the woods” pretty quickly, and every error disrupts the expansion of everything afterwards. The output turns to unrecognizable gibberish almost immediately. The decompressor itself doesn’t know it’s lost until the final crc check. (There are few illegal states on the way; if there were, that would mean that there is redundancy in the data, and a compressor’s job is to find redundancy and squeeze it out.)

The state of the art among the numerous “zip file repair programs” out there seems to concentrate on only two easy fixes (please correct me if I’m wrong):

  • Fix incorrect crc/checksums so that users won’t get an error message any more.
    This doesn’t repair any data, but it does recover from some trivial file truncation or extension things that must happen occasionally to somebody (else why would this function be helpful?).
  • Skip over archive members with corrupt data and find other members that are not corrupt. This is useful if the cause of corruption is a bad block on the hardware medium.

Neither of these does anything to improve corrupted data.

In the general case, solving this problem by brute-force search through all possible repairs is not feasible; unless the file is small, it’ll still be running when the lights go out on the universe. It turns out, though, that if the data has some structure, that’s enough to prune most of the search tree, and prioritize the rest, so that the highest-probability possibilities are tried first.

Apache access logs have plenty of structure, so my colleague got back a close match to his original data. I’ve documented the process
(look here for slightly more detail)
to offer hope to others in difficult cases of critical data in otherwise hopelessly damaged files. Unfortunately it’s not a turn-key process, each case requires a certain amount of tuning based on the cause of the corruption and the structure of the data.

OS Demo Perils

A great OS demo anecdote from Bryan Cantrill that starts with the true observation that:

One of the downsides of being an operating systems developer is that the demos of the technology that you develop often suck. (“Look, it boots! And hey, we can even run programs and it doesn’t crash!”)

and continues on with a story about a core dump in front of customers.

Summer Reading

I am taking these on vacation:

  • Grant Comes East (book)
    by Newt Gingrich, William Forstchen.
    Volume 2 of an alternate history of the U.S. Civil War.
    I found Volume 1 (Gettysburg) engaging, even though I am not a Civil War buff.

  • Telluride Sessions (Audio CD)
    by Bela Fleck (banjo), Sam Bush (mandolin), Jerry Douglas (dobro), Mark O’Connor (violin), and Edgar Meyer (bass). My favorite virtuosi.

Here’s a 10% discount at Amazon if bought by August 3, 2004.

DTrace, DProbes, LTT comparison

Daniel Berrangé: A Comparison of features for the current generation of operating system trace tools (Solaris 10 and patched Linux)

[ via Bryan Cantrill]

Mars Rover Image Interfaces

MIT Technology Review:
Mars Rover Image Interfaces

[Thanks to Dave Winer for the link.]

IT Infrastructure Library, Self Assessment

The ITIL (IT Infrastructure Library) and ITSM Directory web site has many other useful pointers,
including some links to ISO 17799 material.
“The IT Infrastructure Library, ITIL (®), is a series of documents that are used to aid the implementation of a framework for IT Service Management (ITSM).”

Dann Sheridan: ITIL Self Assessment:

The self assessment should be used continuously to measure the progress toward the initial ITIL implementation objectives. It takes about an hour and a half for one person to complete. Relatively speaking, ita good way to get a quick read on the state or your IT processes

See also ITIL Survival. (I also picked up that link from Dann Sheridan.)

Risk Aversion kills LifeLog

DARPA’s project to capture, store and interpret mounds of data collected on or about a person, is basically a good idea, and will happen one way or another on the road to augmenting human abilities with technology.
Wired News: Pentagon Kills LifeLog Project says it’s done. I deeply sympathize with critics over privacy concerns, but can we look at the merits and avoid knee-jerk reactions to every government project?

[See also Techdirt: DARPA Ditches Backup Brain Plans]