Skip to Content

Training Example-Based Taxonomy with 'keywords.doc'

Hi All,

I am attempting to train my example-based taxonomy nodes using an existing hierarchy which contains 'keywords.doc' files in each of the nodes. These files simply contain 6-10 word which are key to the classification of the content to that node.

I have found that after classification these 'keywords.doc' files may have been classified into upto 3 separate nodes. I would have expected these files not to be classified along with the rest of my content but simply used as examples and to remain firmly in the nodes to which they were assigned manually prior to the classification run.

Am I misunderstanding this or is there some other steps I need to perform to get this to work correctly?

Rgds,

Marc

Add comment
10|10000 characters needed characters exceeded

  • Follow
  • Get RSS Feed

1 Answer

  • May 13, 2004 at 02:35 PM

    Hi Marc,

    only files that are manually classified to a node - either in details menue or by Classification Inbox - will be exclusive to that node and also be considered as training documents.

    As you are uploading the files with the structure, they should, as you say:

    a) be classified to the node they come in

    b) not ever be declassified there, unless manually

    c) be automatically classified to any other nodes they fit

    That is the principle, as it is supposed to work.

    Anything different you will have to specify in the Classification Inbox.

    Regards, Karsten

    Add comment
    10|10000 characters needed characters exceeded

    • Hi Marc,

      if you want to be that specific on your terms, you are better off with a query-based Taxonomy (QBT).

      In the example-based case, the linguistic tool in TREX will decide for itself, what belongs together. And your example "health and beauty" is not likely to be identified as a noun phrase or adverb noun phrase. The "beauty of health" or "health's beauty" or "healthy beauty" or - away from the example - "business case" are more likely to be identified as coherent phrases.

      After all, QBT is for specifying the last details of criteria yourself, EBT is for a more "organic" approach that is more automatised but subsequently less controllable.

      QBT is strictly boolean and all you specify will be done exactly so. The methods behind EBT yield, as completely independant research in the field of artificial intelligence shows, a max of 80-90% correctness from the human perspective.

      Regards,

      Karsten