Skip to Content
0
Jul 01, 2009 at 10:58 AM

Indexing Word Document files using TREX in ABAP

95 Views

Hello All,

We are using TREX Search in ABAP Code.

The TREX Server is present in another server and we call the TREX Function modules using RFC.

The procedure we follow to search is

1) Create Index using TREX_EXT_CREATE_INDEX

2) Index the data using TREX_EXT_INDEX (and call TREX_EXT_OPTIMIZE after indexing)

3) Search using TREX_EXT_SEARCH_DOCUMENTS

We were able to successfully index and search if the index is used for plain Text data. The index data is

doc_key = 'Test1'.

doc_type = 'A'. "A for ASCII

doc_langu = 'EN'.

content = "Text String"

So using the doc_type = 'A' we are able to index and then finally search.

However we want to use TREX for search in Word documents.

We retrieved the document content into a character string (the binary data of document is converted to character string) and passed to TREX index as Content (field content).

Below is the input that we are giving

doc_key = 'Test1'.

doc_type = 'F'. "We tried all variations 'B', 'E' etc

doc_langu = 'EN'.

mime_type = 'application/msword'.

content = lv_data_str.

The lv_data_str has the document content as string.

When we give 'B' as document type, there is no UTF conversion happening, but the TREX FM TREX_EXT_INDEX internally converting it to Binary.

At this point all the data is lost.

But if we give other than 'A' (ASCII) or 'B', we get a code page conversion error.

Can anyone please help as how to give the basic input for word documents so that indexing happens.

Thanks,

Anand