Recently, I’ve taken it upon myself to archive some of my grandmother’s pictures of our family. Archiving photos is a daunting task for anyone who does it correctly. Hundreds of images, (many of them with important information on the back such as names and dates) need to be scanned (matching both front and back image), and cataloged in some way. I am not a person with a lot of time on my hands, so I came up with a more automated solution to archiving by modifying some GIMP plug-ins to help me with most of the tedious parts. This is the first part of a series dealing with archiving and genealogy.
Scanning photos is the best way to archive them. A good quality scanner (preferably one with the ability to scan negatives if you have a lot of them) is integral. Scanning with the appropriate settings is the of the utmost importance. You want to scan the images with enough fidelity that you never have to scan them again (they might not exist for later generations to redo your archiving work). There are a lot of people online who have asked about what quality to scan images for their family archive. This is always going to be a struggle of several things:
Format: Pick an uncompressed file format! Compressing images results in artifacts in the image. If you are unsure what this means, I’m sure you have come across a lot JPEG files that are of poor quality, or that look “dirty”. This is because most JPEG (jpg, etc) images are compressed. The best format to use for archiving is TIFF format.
File Size: (the better quality, the more hard disk space is required) Some can be very large, hundreds of megabytes depending on the image, especially when using TIFF format.
Dots per inch (DPI): I take the advice from the Washington State Library’s suggestions for best digital practices; 300 – 600 DPI. This might not always be possible due to file size limitations. I try not to scan at less than 300 DPI. If you want to blow the images up to larger sizes for printing later, higher DPI settings should be used. There is a point of diminishing returns on this, especially if you go over the actual hardware limit of your scanner. Some scanners offer insanely large DPI scans, but this is generally not the hardware limit. This is a digital extrapolation of your image data. Imagine zooming into a picture on your computer to…say.. 500% its normal size. Where does the data come from for scaling > 100%? The computer adds new pixels to the image and sets their value based on surrounding pixels. As it sounds, this will not be good news for your archived images.
When scanning, it is a waste of time to scan one image at a time. What I do is fill the entire scanning platter with images, then scan all of them at once. When I first started doing this, I would separate the individual images by hand, but that quickly got out of control. I found a free GIMP script that could separate and save all the images for me. I modified it to add TIFF support and added the ability to append something to the filename (more on this in a minute). First, check out our other post on how to install the software (Gimp) as well as the scripts and plugin.
This plug-in works great, but sometimes needs a little bit of help. Go into all the “white space” between the images and use the “Rectangle select” tool to highlight the areas between the images. When this is selected, hit the “delete” or “backspace” button on your keyboard to delete everything in the rectangle. Once there is a good grid of whitespace between all the images (it is best to try to remove all the ‘white space’ from the original scanned image) you can run my modified plug-in to automatically separate the individual images.
First you need to install the script. Open GIMP, and go to “Edit–> Preferences–> Folders–> Scripts.” This will show you the directory to download the following script to : Adams Divide.scm
The video below shows how to use the script on a set of scanned photos:
A new window will pop up when you click on the script name with a lot of options. Here’s a description of everything:
Select Threshold – This is the same as if you tried to use the “Fuzzy select tool” It will help select the whitespaces between images. You can adjust this to a higher value to attempt to get closer crops for each picture, but generally I leave it between 10-15.
Size Threshold – Leave this at 100%
Abort Limit – This is the maximum number of images it will try to create. You should enter the number of individual images you scanned in this particular image you’ve opened.
Background Sample Corner – This will select a color from one of the corners. Pick whichever corner is white space. The plug-in will use this color with the Threshold selected above to try to find all the white space between the images. Just make sure that whichever corner you choose is completely white. Specifically it will use the two offset values below as the color.
Background sample offset X and Y – This can be used to make sure you hit white space when selecting your background sample color. (Example: 5 pixels up and 5 pixels left of the bottom right corner of the image should be white if you use the default settings.)
Run Deskew – This will automatically run the deSkew plug-in if it is installed correctly.
Save and Close Extracted Images – This will automatically save the resulting individual images to the setting below.
Save Directory – where you want to save the images
Save File Type – this defaults to tiff, but can be changed if you want
Save File Base Name, Save File Base Name, Save File Start Number, and Append To Filename – These help you save the resulting individual files sequentially.
The “Append to Filename” section exists so you can try to match up the fronts and backs of images by having them saved with the same base name and number. For instance the front of a given image is “IMAGE0007.tiff” the back might be called “IMAGE0007_back.tiff”. More on this in a future post.
Be careful to always change the “File Base Name” or the “Save File Start Number”. This script will overwrite previously saved files without asking for your approval.
Scan multiple images at once on the scanner at 300-600DPI and save the image resulting in TIFF format.
Download and install the suggested GIMP deSkew plug-in and “Divide Images” script.
Open this image in GIMP and create a white grid around the individual pictures by using the “rectangle” selection tool and pressing “delete”
Run the “Adams Divide Scanned Images” script and play with settings to get your selections correct. This will automatically “chop up” your big image to a bunch of individual ones.
Do the same above steps for the back for the images if there is text there, except append something to the filename in the ‘Divide’ script to differentiate the new files form the previous images.