Long form birth certificate: Turn off image enhancement when you’re trying to make a point

When the White House tried to put the long form birth certificate controversy to bed, it should have exercised more care in scanning and publishing the image.

The PDF file it published was produced by some process that included image enhancement of most of the text. As a result, instead of containing a single simple color scan of the document, the PDF file contains a large color JFIF (JPEG) scan with text deemphasized, plus several separate monochrome bitmaps to fill in the text. That maximizes contrast and enhances readability, but it does raise questions about how much the file might have been edited.

The whitened areas in the color image still show some remnants of what was there previously.

It is left as an exercise for the reader what software would perform text enhancement by creating separate layers or subimages from the original scan. I certainly am curious. The PDF file’s properties state that it was produced on a Macintosh (probably “Print to PDF” from some scanning or editing program).

[I have been surprised to find little commentary on this problem with the image file. I thought I was to be the first, but I see that there is a less technical mention of this at hotair.com now.]

If the White House really wants to competently put the controversy to rest, it should walk the piece of paper over to a scanner again, set to maximum resolution, minimum enhancement, select TIFF or PNG (both are lossless) as the output format, and publish that. Please!

See below for the subimages in the White House PDF.


The color subimage, extracted by hand with “vi” and rotated with jpegtran for your viewing pleasure



The monochrome subimages, extracted with pdfimages; you can do it with Adobe Acrobat, using “Advanced>Export All Images…” but that does some unwanted scaling










4 Comments

  1. I really would like to know what scanning/editing suite might result in this kind of contrast enhancement placed in multiple layers. It would also would be interesting to examine the resolutions of all the layers. Comments welcome!

  2. The most plausible suggestion I’ve heard so far is that this is how Adobe Acrobat’s “Optimize Scanned PDF” command works. That’s a testable hypothesis. However, would that produce a file with metadata saying “PDF Producer: Mac OS X 10.6.7 Quartz PDFContext” and “PDF Version 1.3 (Acrobat 4.x)” [those suggests Mac print drivers, not Adobe Acrobat].

    • Great question; Version 5.0.3 of “Preview” which comes with “Mac OS 10.6.7″ and does directly scan to PDF.

      Using anything else to modify the PDF file results in changing the content will change the creator, date/time stamp, etc.

      Therefore, one possibility is the “Scan Optimization” was done during the scanning process. My Epson Scan software does not have that capability and it would be very helpful if someone can prove it’s possible with Preview and another scanner.

      It’s also important to realize the basic PDF header info (aka Metadata) does not keep track of combined PDF sources and editing (i.e. an audit trail). In other words, if you combine several sources/pages into a single PDF file, you will not see a string of sources such as Preview, Acrobat, Illustrator, Photoshop, etc. unless you look into the Advanced Metadata which can also be misleading.

      Another possibility is someone hacked the header info (date, etc.) which is easy to do with a text editor. That would be very difficult to prove, but if possible, it could be a smoking gun.

      Perhaps another plausible explanation is a Mac OS “Automator” workflow was used?

  3. David W says:

    ABBY FineReader OCR will do this.

Leave a Reply