Solved: Training example based taxonomy

Former Member · ‎03-25-2004

I created a file system repository containing some text files of key words in order to provide an initial training set for a taxonomy. According to the Classification Inbox "text/plain", "text" and "plain" have been included among keywords for most of the categories created. How do I stop this happening and is it possible to remove keywords that have been incorrectly learned from documents that have since been automatically classified? Is there a "do not use these words" list these can be added to?

I have to use the example-based taxonomy as this is a project requirement.

Thanks,

-Richard

D021954 · ‎04-26-2004

No, unfortunately it's not possible to influence the extracted keywords directly. I assume that the mentioned technical keywords are attributes of your documents.

But this only occures when the documents contain only very little content text. Can this be the case in you example? Then sample based classification is not working anyhow!

Regards Matthias

Former Member · ‎06-24-2004

Update:

I've since been involved in a project where we demonstrated both Query and Example based taxonomies on the same content.

Query based taxonomy took a lot longer to set up and we needed to have several reviews of queries and results to get something suitable, but it has the advantage of allowing rules based on CM properties.

Example based taxonomy was nice and quick to set up once we had secured example documents (don't under-estimate this - it took weeks). We eventually took a large document that summarised every category they wanted to use, broke it up into individual document of a couple of paragraphs each and fed this into TREX. We found it was pretty accurate with about 6 lines of text to work on.

-R

Former Member · ‎04-26-2004

Hi!

Yes, it's picking up attributes as keywords, normally the filename, extension and MIME type. The example documents contained just 3 - 6 fkeywords and synonyms we wanted to base the category on as the initital training set.

How small can a sample document be before it picks up attributes?

Even when we've obtained real documents, it sometimes learns the wrong words from the text, such as company name or the author. Even if we manually reclassify documents, this does not appear to affect the keywords it's using so next time similar documents are added (ie, company documents with the company name on them) the go back into the category again.

Cheers,

-Richard

Training example based taxonomy

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

Re: Transports in SAC - Package file

SAP CDP - How to identify the hostname?

Re: Issues with "SAP Analytics Cloud, add-in for M...

Re: Building SAP Asset Manager Client (MDK-23.8.7 ...

Re: Finding a Service in SPROXY transaction create...