cancel
Showing results for 
Search instead for 
Did you mean: 

Training example based taxonomy

Former Member
0 Kudos

I created a file system repository containing some text files of key words in order to provide an initial training set for a taxonomy. According to the Classification Inbox "text/plain", "text" and "plain" have been included among keywords for most of the categories created. How do I stop this happening and is it possible to remove keywords that have been incorrectly learned from documents that have since been automatically classified? Is there a "do not use these words" list these can be added to?

I have to use the example-based taxonomy as this is a project requirement.

Thanks,

-Richard

Accepted Solutions (1)

Accepted Solutions (1)

D021954
Advisor
Advisor
0 Kudos

No, unfortunately it's not possible to influence the extracted keywords directly. I assume that the mentioned technical keywords are attributes of your documents.

But this only occures when the documents contain only very little content text. Can this be the case in you example? Then sample based classification is not working anyhow!

Regards Matthias

Answers (2)

Answers (2)

Former Member
0 Kudos

Update:

I've since been involved in a project where we demonstrated both Query and Example based taxonomies on the same content.

Query based taxonomy took a lot longer to set up and we needed to have several reviews of queries and results to get something suitable, but it has the advantage of allowing rules based on CM properties.

Example based taxonomy was nice and quick to set up once we had secured example documents (don't under-estimate this - it took weeks). We eventually took a large document that summarised every category they wanted to use, broke it up into individual document of a couple of paragraphs each and fed this into TREX. We found it was pretty accurate with about 6 lines of text to work on.

-R

Former Member
0 Kudos

Hi!

Yes, it's picking up attributes as keywords, normally the filename, extension and MIME type. The example documents contained just 3 - 6 fkeywords and synonyms we wanted to base the category on as the initital training set.

How small can a sample document be before it picks up attributes?

Even when we've obtained real documents, it sometimes learns the wrong words from the text, such as company name or the author. Even if we manually reclassify documents, this does not appear to affect the keywords it's using so next time similar documents are added (ie, company documents with the company name on them) the go back into the category again.

Cheers,

-Richard

gabriel_candrian
Explorer
0 Kudos

Hi Richard,

a possibility that came to my mind is to start first with a query-based taxonomy with the keywords as queries you used in the small training documents at the beginning. Then after a certain amount of documents have been classified you can switch the taxonomy to "training-based" in the index administration iview. The more correctly classified docs by query in your categories the better the training-based classification should work.

regards

Gabriel

KarstenH
Advisor
Advisor
0 Kudos

Hi Gabriel,

what would be the advantage over the taxonomy training iView? Here you search for documents with which to train nodes of an Example-based Taxonomy. And the advanced search interface basically offers you the same possibilities for a single query as the Taxonomy Maintenance UI for Query-based taxonomies (QBT).

Generally speaking, though, I'd use QBT anyhow, in a case where criteria seem to be very much focussed on single keywords.

Regards,

Karsten

gabriel_candrian
Explorer
0 Kudos

Hi Karsten,

there is no advantage over the taxonomy training iView. It is just that if there are a lot of documents the manual classification of training documents would be cumbersome. I would also use QBT instead of TBT but Richard said TBT is a project requirement.

regards

Gabriel