Siaar Group Inspiring Online

Published by on Monday, October 17, 2011 at 2:37:00 PM

All about PDF and OCR documents

PDF OCR DOC IMAGEToday one thing came to my mind that I should write something about PDF documents, scanned images and OCR software because I found many people even don’t know about the term OCR. So lets discuss them briefly.

I think all PDF documents are not same?

Yes! No PDF documents can be same but they are created in a variety of ways. PDFs that are generated from an electronic source say with a Word document, a computer generated report, or spreadsheet data they all have an internal structure that can be read and interpreted. These types of “generated” PDF documents already contain characters which can be easily recognized by the electronic character designation. As such, conversion from such a PDF can rely on these electronic character designations and provide reliable output.

PDF documents can also be created through the process of scanning pages into electronic format. What a scanned file represents is really just a “image” of the words contained within that document. In order to convert a scanned document into an editable format the required software is OCR which is required to analyze the “picture” of each character and match it to an electronic character-based file. Because of this, it is a much more difficult to ensure that the character that is “recognized” by the OCR software is the character on the scanned document. The quality of OCR output is affected by matters such as poor image quality of the scanned document, mixture of fonts used in the scanned documents, and italicized and underlined fonts and if the document is folded or very old had brown color may blur the quality and shape of individual characters.

I think all PDF Files are not of the same type too.

As noted above, there is more than one way to create a PDF document. One of these methods is by using a scanner, or similar machine, that takes an image of a document and then stores this image as an electronic PDF file. A scanner, or photocopier with scanning abilities, does not recreate each character of every word when it creates this scanned image – rather, it simply takes a “picture” of the image. This image is then turned into a PDF document by software that integrates with the scanner or photocopier – the result is a “scanned” PDF document. There are a variety of scan to PDF software in the market today that can assist with this. The alternative to a scanned PDF document is a created PDF document. For instance, a document that begins as an electronic document, say a Word document, but then is converted into PDF using PDF creation software. In most cases, the PDF creation software will take information from the structure of the Word document – such as character information, word placement information, etc. – and retain these items in the created PDF. As such, there is much more of an internal structure for a created PDF rather than a scanned PDF – Optical Character Recognition software is required to electronically identify each character on a page and then convert it into a useable format. Essentially what it does is extract text from an image.

OCR, OCR Software – This term is always coming when we read about PDF files and Scanned Files to convert them to text so what is OCR?

Optical Character Recognition is the long form for it; Optical Character Recognition (OCR) is a visual recognition process that turns printed or scanned image into an electronic character-based file. A document that is scanned and converted into a PDF document provides the basis for which character recognition software may interpret each character image on the PDF or Image and assign it an electronic character-based file that can then be entered into an editable format, such as a Text or Word document.

Now you came to know about PDF document, Scanned Document and OCR software now it is your turn to discover which is the good software to convert all the PDF and Scanned files to OCR – Which is your OCR Software you use or recommend to our readers?
Published by IzajAhmed Shaikh.
You Might Like More articles
  1. Is Digital Camera better than photographic film
  2. Equipment You Need For Web Videos
  3. Who will win Kaun Banega Crorepati
  4. Print Versus Digital, The Great Dilemma
  5. Gifted Nokia C2 Touch Type dual SIM mobile phone
  6. Where a Business Should Invest to Grow
  7. What a Property Investment Entourage Can Do For U
  8. Lg brings world's slimmest led monitor e90 to Indi...
  9. Tagscanner the best mp3 tagging software, organize...
  10. Bose Companion 5 speakers for Heavy Investors


I am, Mr. IzajAhmed Shaikh, Computer Professional, and Pro. Blogger, who belongs to Shahabad, Karnataka India. My basic Qualifications are B.Sc., and M.C.M. done from University of Pune, formerly known as, Poona University, also like to write articles based on my personal experiences.