Scanning books with a MFC-5895CW

I found a MFC-5895CW in Ryman’s today, discounted to 95 pounds. The MFC-5895CW is a multifunction printer/scanner/fax machine. What attracted me though is that it has a document feeder for scanning. I’ve been looking for a way to scan in my books (even if the process is destructive). As an aside, I’m told that book scanning services are common (and cheap) in Japan, it’s a shame it’s such a pain to get them there…

For my first effort I decided to use a book I was planning to throw out anyway “Digital System Design with VHDL” by Mark Zwolinski (sorry Mark, you were a great lecturer but I just don’t find myself doing much VHDL these days…).

To start with, I removed the front and back cover.

I then cut along the spine using a Stanley Knife:

Unfortunately, as I cut into the book I seem to have moved the knife nearer to the spine (something to avoid next time).

After slicing the whole thing up, it’s ready to scan!

The scanner can cope with about 60pages at a time. However it’s not a duplexing scanner. So once you’ve scanned one side you need to reinsert the stack of pages to scan the other side.

The MFC-5895CV scans directly to pdf. It creates files with a basename followed by a two digit number (01,02,03 etc.). It will also scan directly to a USB stick, which is rather neat.

So, after scanning you’re felt with a series of pdf files on a USB stick. Odd and Even numbered files form a pair of front and back sides of pages. You now need to join all these together.

I used pdftk on Linux to do this. Here’s my bash script (you’ll probably need to change the basename if you use it). It assumes it’s being run in the same directory as the input files.

basename="010111"

for ((i=1; i<=99; i++))
do
  mkdir join
  cd join
  file1=../$basename`printf "%02d" $i`.PDF
  fileout=../$basename`printf "%02d" $i`join.pdf
  i=$((i+1))
  file2=../$basename`printf "%02d" $i`.PDF
  echo "file1: " $file1
  echo "file2: " $file2
  cp $file1 ./first.pdf
  cp $file2 ./second.pdf
  pdftk ./second.pdf cat end-1 output second1.pdf
  rm second.pdf
  mv second1.pdf second.pdf
  pdftk first.pdf burst output %04d_A.pdf
  pdftk second.pdf burst output %04d_B.pdf
  rm first.pdf
  rm second.pdf
  pdftk *.pdf cat output out.pdf
  cp out.pdf $fileout
  cd ..
  rm -rf join
done

pdftk *join.pdf cat output complete.pdf

It all works pretty well for the most part. Some of the pages came out a little askew:

This maybe due to my poor cutting, not having set the feeder correctly, or the generally dog-eared nature of the book.

Diagrams came out pretty well:

Though, you can see some compression artifacts. It could also do with some post processing to increase the contrast perhaps. I might try the next book at 300dpi (I should also probably uses the black and white scanning mode).