Long form birth certificate: Turn off image enhancement when you’re trying to make a point

When the White House tried to put the long form birth certificate controversy to bed, it should have exercised more care in scanning and publishing the image.

The PDF file it published was produced by some process that included image enhancement of most of the text. As a result, instead of containing a single simple color scan of the document, the PDF file contains a large color JFIF (JPEG) scan with text deemphasized, plus several separate monochrome bitmaps to fill in the text. That maximizes contrast and enhances readability, but it does raise questions about how much the file might have been edited.

The whitened areas in the color image still show some remnants of what was there previously.

It is left as an exercise for the reader what software would perform text enhancement by creating separate layers or subimages from the original scan. I certainly am curious. The PDF file’s properties state that it was produced on a Macintosh (probably “Print to PDF” from some scanning or editing program).

[I have been surprised to find little commentary on this problem with the image file. I thought I was to be the first, but I see that there is a less technical mention of this at hotair.com now.]

If the White House really wants to competently put the controversy to rest, it should walk the piece of paper over to a scanner again, set to maximum resolution, minimum enhancement, select TIFF or PNG (both are lossless) as the output format, and publish that. Please!

See below for the subimages in the White House PDF.


The color subimage, extracted by hand with “vi” and rotated with jpegtran for your viewing pleasure



The monochrome subimages, extracted with pdfimages; you can do it with Adobe Acrobat, using “Advanced>Export All Images…” but that does some unwanted scaling










“Ask Jeeves Acquisition of Bloglines”

The 37signals article “Exit Interview: Ask Jeeves’ acquisition of Bloglines” is an interesting retrospective look at the demise of Bloglines. Of particular interest to many of us, who followed a typical user’s path through RSS aggregators (for me, Radio Userland, Bloglines, and finally Google Reader, with dabbling in AmphetaDesk and FeedDemon).

Where Monroe County Residents Are Moving

This is interesting:

More than 10 million Americans moved from one county to another during 2008. The map below visualizes those moves. [Click on the image to go to the interactive map at Forbes.com, then: ] Click on any county to see comings and goings: black lines indicate net inward movement, red lines net outward movement.

Source: Internal Revenue Service data. The IRS only reports inter-county moves for more than 10 people, so some moves are not shown on this map.

Where Monroe County Residents Are Moving (click to go to interactive site)

APNIC triggers final IPv4 address block distributions

APNIC announces:

APNIC received the following IPv4 address blocks from IANA in February 2011 and will be making allocations from these ranges in the near future: 39/8 106/8

Please be aware, this will be the final allocation made by IANA under the current framework and will trigger the final distribution of five /8 blocks, one to each RIR under the agreed “Global policy for the allocation of the remaining IPv4 address space“.

After these final allocations, each RIR will continue to make allocations according to their own established policies.

Diigo takes over Furl

For a few years I have used Furl as my personal bookmarking tool.
Del.icio.us had a better user interface, published much more pleasant RSS and HTML, but it lacked one feature — cached copies of web content.

Now, Diigo is taking over Furl. It was announced a week ago, Furl is no longer taking new bookmarks, and my old data is now migrating into Diigo (probably without cached content, but we’ll see).

I’m hopeful about its personal usefulness: Diigo goes support cached pages, and seems to be pretty flexible in its other connections to the world. There’s yet-another superfluous social networking database that I’ll be ignoring.

CA certificate forged via MD5 collision

Security researchers successfully forge a CA certificate, allowing them to produce new certificates that will be trusted by every browser:

MD5 considered harmful today
Creating a rogue CA certificate
Alexander Sotirov, Marc Stevens,
Jacob Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, Benne de Weger

We have identified a vulnerability in the Internet Public Key Infrastructure (PKI) used to issue digital certificates for secure websites. As a proof of concept we executed a practical attack scenario and successfully created a rogue Certification Authority (CA) certificate trusted by all common web browsers. This certificate allows us to impersonate any website on the Internet, including banking and e-commerce sites secured using the HTTPS protocol.

Our attack takes advantage of a weakness in the MD5 cryptographic hash function that allows the construction of different messages with the same MD5 hash. This is known as an MD5 “collision”. Previous work on MD5 collisions between 2004 and 2007 showed that the use of this hash function in digital signatures can lead to theoretical attack scenarios. Our current work proves that at least one attack scenario can be exploited in practice, thus exposing the security infrastructure of the web to realistic threats.

Now that it’s been demonstrated once, it won’t be long before someone malevolent does it too. One hopes that, by then, software desupporting MD5-based signatures will have been widely distributed, and certificates containing them will have been retired.

Update: Of course, the day of doom can be postponed by getting every MD5-cert-issuing CA to immediately update their software to: (a) stop issuing MD5-based hashes, or, at least, (b) make their certificate serial numbers less predictable. The latter is a fairly small change. Is it too much to hope for?

Program verification is not a silver bullet

Alan Shostack points out that
Alan Drexler is excited about the interesting AMS survey article
“Formal Proof” (Hales)
because of the prospects for program verification.

I think the article is better at inspiring hope about computer verification of proofs than about proof system verification of computers.

For balance one should read the still-persuasive May 1979 paper
“Social processes and proofs of theorems and programs” (De Millo, Lipton, Perlis).
The abstract:

It is argued that formal verifications of programs, no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics. Furthermore the absence of continuity, the inevitability of change, and the complexity of specification of significantly many real programs make the formal verification process difficult to justify and manage. It is felt that ease of formal verification should not dominate program language design.

(Note that that there are lots of copies cached around the web if you don’t have an ACM Portal subscription.)

Desk Checking

Ole Eichhorn has written a great essay on “the lost art of desk checking,” sharing how slow and painful experiences with debugging led to habits of deliberate and careful pre-planning and checking.

My own parallel experiences: Okay, I’m doing to date myself here too. I’m also 49 years old, but didn’t start programming until Senior High. First experiences were with Basic on a Xerox Sigma 7 (thanks, Xerox), and a Wang 2200B. Not much learned there.

I learned more during summer vacations, when I paid real money to the University of Rochester to use their mainframe. I discovered that my first APL programs actually worked. I tried my hand at IBM 360 assembly language programming, but debugging was expensive – each assemble/link/run cost over $2. So I started editing the binary object decks on a keypunch instead, reducing the cost of a link/run to something under 80 cents.

While I followed the technology curve and have all the modern development environment power tools, there’s nothing like designing cleanly and understanding what’s going on. To quote Eichhorn:

To write code I just look at my screen and start typing, and to fix code, I just look at my screen some more and type some more. So now, finally, I‘m done with desk checking, right?

Wrong.

I desk check everything. Thoroughly.

And this, to me, is a major league black art which is lost to all those who didn’t have to hand-punch cards and wait a week for their deck to run. It is a lost art, but an essential art, because all the tools which make entering code and editing code and compiling code and running code faster don’t make your code better.

MessageLabs versus GMail

MessageLab mail forwarders have been unwilling to talk to GMail servers at least since Saturday 2008-03-29, with a mix of TCP “connection refused” and SMTP “421 Service Temporarily Unavailable”.

Perhaps it’s related to flurry of articles about GMail CAPTCHA cracking three weeks ago and the resulting surge of spam.

Whatever the reason, it’s a painful outage.

Followup, Tuesday 2008-04-01 7:48am EDT:

MessageLabs appears to be listening to GMail’s servers again. New messages are flowing. I haven’t seen the messages queued up during the outage, yet.

More detail on the Google CAPTCHA-cracking botnet.

blog backup online – out of beta

I’ve been using the blogbackuponline beta since last April.

It just works.

Now it’s out of beta. I recommend it. (I’d recommend it even if Techrigy didn’t offer a small incentive to share the experience.)