Skip to Content
author's profile photo Former Member
Former Member

TREX follows URL

Hi,

I have a "jump page", which is dynamically generated with a big list of links that refers to a set of items I would like to index. I'm trying to point TREX to the jump page and have it follow through all the URLs on the page and index all the items. However, it seems to me that, TREX is only indexing the jump page and not following through to any of the links in that page. Is there a way to make that happen?

I have also tried to create a web address with a URL like http://www.mydomain.com/departments/sales/ (with no index.html or any file name), and that seem to be able to crawl a bunch of other files within that directory. What are the difference between the two scenario?

Currently, I'm doing that by creating a Web Address, and have the index point to the web address as a datasource. I've read other threads about creating Web Repository and/or HTTP Systems, will that make a difference?

Please help. Thanks.

Michael

Add a comment
10|10000 characters needed characters exceeded

Related questions

1 Answer

  • Best Answer
    Posted on Sep 07, 2007 at 10:38 AM

    Dear Michael,

    There’s a restriction regarding hyperlinks. Crawler can follow hyperlinks on web pages only if contained in the source of the HTML-page embedded like <a href=“…“>. Hyperlinks embedded in Javascript or any other scripting language are not followed by the crawler, hence not getting indexed.

    Best Regards, Peter

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      Hi Michael,

      some things that could be the reason for your problem:

      1) Make sure the webpage you are crawling doesn't have a robots.txt which disallows the subpages you want to crawl.

      2) Make sure your jump page doesn't forbid a crawler to follow links. Look for a meta tag with the name "robots". Make sure it is set to "index,follow":

      <meta name="robots" content="index,follow" />

      If the metatag doesn't exist it might help to create it with the above mentioned parameters.

      3) Check the crawler configuration in the Knowledge Management settings in your portal. Make sure the crawler you are using has an appropriate maximum depth set (depending on how far you want to follow links, 2 or 3 is a good starting point). Also you might have to check the option "Follow Redirects on Web-Sites", depending on if your website uses redirects or not.

      4) I've never tried this with just a website, only with Web Repositories. So it might in fact make a difference. If none of the above works I suggest configuring a Web Repository for your site.

      Hope any of this helps,

      Esther

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.