Document Imaging Glossary
Archive – A digital record of historic documents, important but not often referred to. Business records more than 30 years old would but a good example.
Backfile – Existing paper files.
Backfile Conversion – Converting paper files to digital.
Backup – Imaged documents are stored on a server, but what if lightening strikes the building? Your files should be backed up either onto CD, DVD, or mag. tape, and stored at a different site.
Bar Code – Not just for the grocery store any more, bar codes can be read by scanners to identify documents or direct the scanner to perform a particular task. A barcode printed at the top of a form can contain all sorts of information.
Boolean search - A search using and / or. This is far more sophisticated than a simple word search, and will allow for some serious research.
Concurrent users – The number of people who can use a document management system at one time. Not to be confused with seats, see below.
Day Forward – A scanning and / or Document Management system designed to begin at a particular date rather than capturing existing records.
De-speckle – Quite often if we Image a document originally on colored paper, the color will come through as speckles. The speckles can be removed if it is important, usually it isn’t worth the extra time.
Document – A document differs from a record in that it can be added to or changed. For instance, a monthly ledger sheet is a document until your Accountant closes out the month, at which time it becomes a Record. Your system had better know the difference.
Document / Records Management – We have seen all sorts of things called Document or Records management, usually by people selling something that is not. Real Document / Records Management is a software program that allows for a logical storage and retrieval of your records given the parameters determined by you. Your computer’s file registry, for instance, is not a document management system.
It may seem confusing at first, but only because you get to make up the rules and almost anything is possible. This is not a do it yourself project.
Document prep. – If you can’t feed a piece of paper through a photocopy machine you can’t run it through a scanner. Staples have to be removed and ripped paper taped. With some projects, document prep is a major expense, with others it hardly exists. That is why you should distrust anyone willing to give you a flat price for scanning your documents, particularly if they haven’t laid eyes on them.
DPI – A setting on the scanner meaning dots per inch. The usual setting for office documents is 200 DPI. A detailed color photograph might be scanned at 600dpi. The higher the DPI you use for your images, the sharper they will be, the more storage they take up, and the slower the imaging process.
File Format – Once scanned, your documents are kept in any one of several formats. A tiff4 is the most common for documents you don’t want to alter. PDF files are used quite often for OCR’ed files or for presenting electronic brochures. Any good Document management system can handle several different file formats at one time. Get advice as to which format is best for which use.
Fuzzy search - What if you want to search for a name and you aren’t sure if it is spelled Smith or Smyth? Maybe you are sure of a month and a year, but have no idea of the day? Use an asterisk in place of the part you aren’t sure of. This is far more sophisticated than a simple word search. Sometimes called fuzzy logic.
Image Capture – Making a digital image of a document using a scanner and software.
Indexing – Attaching identification to an image or group of images, in essence a filing system. This could be a name, date, location, transaction number, customer ID, or any combination of these. Without an Index, images are nothing more than a random collection of files impossible to locate. Imaging your files without an Index would be like dumping your files in a heap on the floor. Indexes are entered by way of the Document Management or search software. They can be completely automatic, as with zonal recognition or barcodes, or labor intensive.
Large Format – Imaging large maps, drawings, or photographs. Most production scanners handle anything from a business card to 11”x17”. A large format scanner will handle anything up to 36”x54”, some even larger.
Licensing – One does not own software, one licenses software. That means that you also pay a yearly fee to update and maintain it.
Modules – When you purchase a document management system you get the basic capabilities. But if, for instance, you want to give your people access from remote sites, or people at other locations access through the web, you buy another module. The intent is that you buy only what you need, and add capabilities as you want them.
Native Format – Save a Word document as a Word Document and an Excel spreadsheet as an Excel spreadsheet. Not all Document Management software does this, but that is what native format means.
OCR – Optical Character Recognition, this feature turns printed documents into a digital file, and then into a text file so content can be searched or changed.
Operator – A person who either Scans, indexes, or QCs.
QC Station – Every system needs a QC person to assure that Images are good and that they are indexed properly. OCR’ed documents and documents on dark colored paper in particular need QC work. OCR engines never work at 100% and need a QC person to correct the text, and colored paper can make capturing an image very difficult.
Record – A record is a fixed, unchanging history. A scanned record is as legal in a court of law as the original piece of paper was, SO LONG AS IT WAS KEPT IN THE RIGHT FORMAT AND KEPT SAFE. If your only scanned copy of a document is in an OCR’ed version, forget it. It is a working document and not a legal record.
Redact – Do you ever have to black out names or social security numbers from documents before they can be released? It’s a tedious process that can largely be automated.
Retention – How long do you have to keep you files? Financial, HR, and sales records all have different retention cycles. This can be made an automated or manual process, but in either case far faster than with paper files.
Scanner – The piece of hardware that digitalizes your files. Duplex scanners take care of both sides of a document at once but do not keep the images of a blank piece of paper. On production scanners the feeder works a bit like a photocopy machine. A scanner is a wonderful bit of technology, but it is just a machine, without software it just sits there.
Seats – The number of users who have been granted the right to access files on the Document Management system. Not to be confused with Concurrent Users, concurrent being the operative word in that case. Server – The computer that contains your scanned documents and controls the Document Management software. The computers of the staff who have been granted access to these files are connected to the server in one way or another.
Storage Media – The piece of hardware that stores your files. A floppy is a storage media. By far the most common storage media in use today is the RAID, basically a number of stacked hard drives controlled by the server.
Web Based – Do you want people in several locations to have access to your documents? There are several ways to do this, but the simplest and cheapest is buying the Web Module.
Workstation – The computer that an Operator works at.
Zonal Recognition – OCR’ing a specific part of a series of documents to gather specific data. The documents have to be consistent and the information in the same place one each form. This can be an extremely useful tool, particularly with financial documents.
