Solved: TREX does not search PDF files

Former Member · ‎10-05-2004

Hi,

we have another problem with TREX 6.0.

Our file repository is working fine, search also works for .txt files, but doesn't work for pdf files. Out pdf files are indexed correctly, but there are no result for this kind of files if we do a search.

What can we do?

Kind regards

Thomas

Former Member · ‎01-10-2005

Your situation may already be solved. However, one thing I did not hear in the details was: 1) how many PDF's were being indexed. What was the size of the files? Did you check the TREX Monitor to ensure all the PDF's had been sent through the entire system. In the crawler monitor, did it state it found the correct number of files you believe to be in the index? By default, TREX holds documents in a que for 30 minutes between processes unless you either reset this property or flush the que.

There is a document TREXRecomenations which give some very good tips with regards to file size and other common settings. For PDF it states:

You want to index very large documents in PDF format from Adobe. These documents are not being indexed because they fail to pass the preprocessing stage.

Limitation PDF is a complicated file format to preprocess. Typically PDF files larger than 15 MB cause problems. The time taken for preprocessing and filtering rises to over an hour and the process delivers bad results. Recommendation You should avoid the indexing and processing of PDF files that are larger than 15 MB.

If you cannot find this document, let me know and I can forward it to you

Former Member · ‎01-08-2005

Hi,

It happend to me also at first. I did upload 4 pdf document in the portal. TREX run wucessfully (According to the queue status) but the search return no document. classification som-how working as expected. I then upload another pdf document and the result still the same. The problem was resolved somhow after a .txt document uploaded and reindex request submit and ran.

I still don't know what is the course of the problem but I'm happy that it is working now.

Good luck to all of you

Former Member · ‎10-13-2004

Hi everybody, same Problem.

When I create a very simple (only some words) pdf file out of a *.txt file. It is indexed perfectly and a search for the words in the pdf file shows that pdf file. But this is only working for these simple files. When I take a manual from SAP and try to index it. It does not work.

When I search after a part of the Title the document is shown but with "No document excerpt available". So I assume that TREX is not able to read the document. Not even parts of it. But the document contains text that should be indexed.

Does anybody have an idea how this problem may be fixed?

Eik

KarstenH · ‎10-05-2004

Hi Thomas,

PDFs indexed correctly but no PDFs found, right?

Do the PDFs by any chance contain scans or faxes or the like?

Then they do pnly contain bitmaps, no text, and only their properties, titles and descriptions will be indexed.

Or do they maybe contain scans and OCRed text as hidden text?

Then please read this post/thread:

Regards,

Karsten

TREX does not search PDF files

Accepted Solutions (1)

Accepted Solutions (1)

Answers (3)

Answers (3)

Re: SAP Analytics Cloud for planning - Set Advance...

Re: SAP MDK: Attachment delete button not working ...

Re: Problem with Select-Options

Re: I am using exit_saplkedrcopa_001 for enhanceme...

Re: ABAP2XLSX delete row