Clear and Compact Storage

The easiest way to digitize printed documentation is by scanning. In archival storage, good image quality is important - this implies high spatial resolution (high pixel per inch) and good intensity resolution (high bits per pixel). However, this creates large files, in turn causing large disk storage and Internet download time. By picking the right scanning format and storage medium for each type of image, file size can be minimized without sacrificing much clarity.  With the recent dramatic increase in file storage capacity and internet bandwidth, it would make sense to go in the direction of higher resolution rather than small file size.

All text and most line drawings are best scanned as "line art" with one bit per pixel. Standard-sized text can usually be scanned at 300 dpi (dots per inch). This corresponds to early laser writer resolution. Intricate diagrams and small text should be scanned at 600 dpi. If the file is saved as "greyscale", meaning 4 to 8 bits of intensity resolution, higher effective resolution results. The "brightness" setting for the scan is critical for line art scans. Try a few test scans and zoom in and look carefully at the scan. If too light, information will be missing. If too dark, lines and letters will start "bleeding".

When scanning half-tone images, it is important to "de-screen" the image.  If not de-screened, ugly moire patterns occur.  Some scanners offer built-in de-screening, but this will often blur text.  The best way to descreen an image on a page is to scan the page in higher than normal resolution, say, 600 dpi greyscale, select the half-tone image(s), and use a Gaussian Blur to blur the half-screen dots.  Experiement to find the optimum blur.  Afterwards, inverse the selection and increase the contrast for the text.  When done, reduce the image to your normal bit density, say 300 dpi in this example.

If you use an image editing program such as Photoshop or equivalent, defects in the scan can be touched-up. Use the arbitrary rotation function to make the page rectilinear.  You can also crop white space around the image. Keep in mind that if you scan most of a page, the page may not be able to be entirely printed by most laser printers - they need a margin of up to 1/2" on each side. If you want to handle this case, you can shrink the image so that it is no larger than 7.5" by 10" (for American "letter"-sized paper). If you do shrink the image, you may need to scan at high resolution (say, 600 dpi), do the shrink, then convert to 300 dpi.

The best current formats for saving line art is the tiff format and the png format. They have built-in data compression and handles sharp-edged images well. The jpeg format tends to make line art fuzzy. A tip for reducing the line art file size: eliminate black speckles or other noise. A clean white backround compresses very well. A problem with gif is that an image saved as 300 dpi will often come out as 72 dpi when down-loaded. No pixels are lost, but the image is huge, which confuses some browsers. The most compatible all-around format is the Adobe Acrobat (pdf) format.

Photographs should be scanned in a multi-bit per pixel grayscale format (or color, if needed). The best format for storing pictures (not line drawings) is the jpeg format. Experiment with the different compression settings available, and chose the most efficient one that does not compromise the image quality. Don't scan and save a black and white image in color - it increases image size.

If you want to save a multi-page scanned document into a single file, the Adobe Acrobat pdf format is recommended. This requires either buying the full Acrobat program, but there are other, often free, pdf creator packages available. Most pdf readers (which are free) have nice options for printing oddball sized images. Acrobat does its own data compression, so you can import uncompressed scans (such as tiff or bmp files) directly into Acrobat, and the result will be about as good as a gif compression. Acrobat is also the medium of choice for distributing Postscript or EPS files.

When combining line-art images into a single pdf file, saving each page as a tiff (.tif) file with the page number as part of the file name saves time and allows multiple pages to be imported into Acrobat at one time. Acrobat 4.0 allows up to 50 images to be imported at once. For documents over 50 pages, make pdf files of chunks of 50 pages or less, then combine them into one pdf file.

Multi-page scans can also be compressed into a single file by creating a .zip archive, or a tar archive in UNIX. These are not as convenient to the end user as an Acrobat file, since most decompressors dump the individual page files on the hard disk, leaving a mess to clean up later. Also, the file compression of these programs isn't needed, since gif or jpeg files are already compressed.

For those with a lot of time on their hands, they can scan technical documentation, use OCR (optical character recognition) to convert text to ASCII characters, then use a page layout program or sophisticated word processor to essentially re-typeset the documentation. This requires skill, time, and good software, but is the most efficient way to store documentation.