When we are working with documents created by a computer software and then saved to the PDF format (i.e. with native PDF files), such files have all the necessary information encoded into them. Our software can then read this information and extract it into an output file.
Contrarily, when we are working with a scanned PDF file, this file is simply an image – hence, there is no information stored for our program to read. In such a case, we rely on OCR engine which stands for Optical Character Recognition. The OCR technology can extract the data from the file by scanning and recognizing the alpha-numerical characters in the document. Unfortunately, it is unable to recognize image portions, so these graphics will either be dropped from the conversion or will result in inaccurate text.
This article refers to Able2Extract Professional.
Comments