Skip to Content
0
Former Member
Sep 22, 2005 at 01:29 PM

Text extraction from scanned documents in KM

40 Views

Please let me know if there are any partner solutions or consulting solutions to implement the following scenario:

- an organization has a lot of scanned documents in a file storage,

- electronic versions (.doc prototypes) of these scanned documents often can't be found or there will be differences in versions of scanned and electronic documents

- we want a possibility to search for the scanned documents with TREX (using full text search) as for the usual documents in KM.

Can we do a kind of a preprocessing: in the process of uploading a document to a KM repository, this document should be checked and if it is a scanned document, an external application should be called and the document transfered to it. The application then will analyse the document, extract its content in a text form and return it to the KM, where this text content should be added to the properties of the document (i.e. brief description) and used for indexing.

Thank you,

Sergey