Thursday, December 20, 2012

Vectorizing scans

I am always unhappy about the way my scanned handwritten documents look. Pencil looks especially bad. I have earlier used the fantastic inkscape to trace bitmaps and create vectorized versions of my scans, with success. But the process is somewhat cumbersome, and I tend to forget how to do it between the times I need it.

The result is worth it, though. Black is real black, and since it produces vector graphics the edges are not pixelated, even if you zoom in.

Original scan, grey and pixelated.

After the vectorizing, black and white, and no visible pixelation (you may distinguish pixels here, since I converted the image from pdf to png).

Here comes a way to do it in one command from a terminal.

Install the tools

Inkscape uses potrace to trace bitmaps. Luckily, potrace is available as a standalone program. You need to install it. I used the package manager fink to do that:
$>fink install potrace

You need also pdfimages to extract images from the original pdf. It is included in xpdf, which I also installed with fink:
$>fink install xpdf

I did not need to install mkbitmap or ghostscript (gs), but I might have installed them earlier with other programs. If you get complaints about either of them, you'll need to install them as well.

Create a script

The process of conversion is as follows:
- Extract images from the original pdf
- Threshold the images to get black and white pixels only
- Trace the bitmap images to vectors and generate pdf files
- Merge the pdf files into one single pdf

Here is the script that does all of that:
#! /bin/sh
# Create ppm images, one per pdf page.
pdfimages $1.pdf $1_conv_temp
# Create pbm images (no grey levels) from the ppm.
mkbitmap $1_conv_temp*.ppm -t 0.48
# Create pdf files from the pbm images, one per original page.
potrace -b pdf -r 150 $1_conv_temp*.pbm
# Combine the several pdf files into one.
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
-sOutputFile=$1_vect.pdf $1_conv_temp*.pdf

# Clean up
rm $1_conv_temp*

Copy this to a new file, which you can call, for example.
Then you need to make this script executable with:
$>chmod +x
Make it accessible from anywhere by adding the scripts location to your path.


When you are setup, run this script in a terminal:
$>vectorize original
and your file original.pdf will be copied and vectorized to original_vect.pdf.

1 comment:

  1. I read your articles very excellent and the i agree our all points because all is very good information provided this through in the post. convert pdf to grayscale online free