cancel
Showing results for 
Search instead for 
Did you mean: 

TREX does not search PDF files

Former Member
0 Kudos

Hi,

we have another problem with TREX 6.0.

Our file repository is working fine, search also works for .txt files, but doesn't work for pdf files. Out pdf files are indexed correctly, but there are no result for this kind of files if we do a search.

What can we do?

Kind regards

Thomas

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Your situation may already be solved. However, one thing I did not hear in the details was: 1) how many PDF's were being indexed. What was the size of the files? Did you check the TREX Monitor to ensure all the PDF's had been sent through the entire system. In the crawler monitor, did it state it found the correct number of files you believe to be in the index? By default, TREX holds documents in a que for 30 minutes between processes unless you either reset this property or flush the que.

There is a document TREXRecomenations which give some very good tips with regards to file size and other common settings. For PDF it states:

You want to index very large documents in PDF format from Adobe. These documents are not being indexed because they fail to pass the preprocessing stage.

Limitation PDF is a complicated file format to preprocess. Typically PDF files larger than 15 MB cause problems. The time taken for preprocessing and filtering rises to over an hour and the process delivers bad results. Recommendation You should avoid the indexing and processing of PDF files that are larger than 15 MB.

If you cannot find this document, let me know and I can forward it to you

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi,

It happend to me also at first. I did upload 4 pdf document in the portal. TREX run wucessfully (According to the queue status) but the search return no document. classification som-how working as expected. I then upload another pdf document and the result still the same. The problem was resolved somhow after a .txt document uploaded and reindex request submit and ran.

I still don't know what is the course of the problem but I'm happy that it is working now.

Good luck to all of you

Former Member
0 Kudos

Hi everybody, same Problem.

When I create a very simple (only some words) pdf file out of a *.txt file. It is indexed perfectly and a search for the words in the pdf file shows that pdf file. But this is only working for these simple files. When I take a manual from SAP and try to index it. It does not work.

When I search after a part of the Title the document is shown but with "No document excerpt available". So I assume that TREX is not able to read the document. Not even parts of it. But the document contains text that should be indexed.

Does anybody have an idea how this problem may be fixed?

Eik

Former Member
0 Kudos

Hi,

we solved the problem by installing the newest TTREX patch available.

Regards

Thomas

KarstenH
Advisor
Advisor
0 Kudos

Hi Thomas,

that would be which exact SP, Patch, HF?

Just for the record of this thread...

Thanks, Karsten

KarstenH
Advisor
Advisor
0 Kudos

Hi Thomas,

PDFs indexed correctly but no PDFs found, right?

Do the PDFs by any chance contain scans or faxes or the like?

Then they do pnly contain bitmaps, no text, and only their properties, titles and descriptions will be indexed.

Or do they maybe contain scans and OCRed text as hidden text?

Then please read this post/thread:

Regards,

Karsten

Former Member
0 Kudos

Hi Karsten,

no, the pdf files do not inlcude OCRed texts. We did some testing with pdf files from SAP (installation instructions for installing the enterprise portal).

So if we do a search, there are no results.

Kind reagrds

Thomas

KarstenH
Advisor
Advisor
0 Kudos

Hi Thomas,

1) Which Patch Level of TREX are you on? At least TREX 6.0, SP1, Patch5 I hope...

2) As it was not the "bitmap in PDF" case, which is rather frequent, this will be hard to resolve without looking at your system.

=> Can you please open a support message to SAP concerning the issue?

Thanks,

Karsten

Former Member
0 Kudos

Hi Karsten,

TRex Monitor says: Server-Status|| Version: 6.0.1.2.0 Build: 601206542

I assume this means we're using version 6.0 SP1 Patch 2?

Regards

Thomas

KarstenH
Advisor
Advisor
0 Kudos

Hi Thomas,

there is no known issue in that TREX version that would cause your problem.

This means:

You will have to open a support message to SAP and have somebody look at the system.

Regards, Karsten