Call Support: +1 (647) 557 5884

Are all PDF documents the same?

Follow

NO. PDF documents can be created in a variety of ways. PDFs that are generated from an electronic source, such as a Word document, a computer generated report, or spreadsheet data, have an internal structure that can be read and interpreted. These "generated" PDF documents already contain characters that have an electronic character designation. As such, conversion from such a PDF can rely on these electronic character designations and provide reliable output.

PDF documents can also be created through the process of scanning a document into electronic format. What a scanned document represents is really just a "picture" of the words contained within that document. In order to convert a scanned document into an editable format, OCR software is required to analyze the "image" of each character and match it to an electronic character-based file. Because of this, it is much more difficult to ensure that the character that is "recognized" by the OCR software is the character on the scanned document. The quality of OCR output is affected by matters such as poor image quality of the scanned document, mixture of fonts used in the scanned documents, and italicized and underlined fonts, which may blur the quality and shape of individual characters.

This article refers to Able2Doc, Able2Doc Professional, Able2Extract and Able2Extract Professional.

Have more questions? Submit a request

Comments

Powered by Zendesk