cancel
Showing results for 
Search instead for 
Did you mean: 

TREX follows URL

Former Member
0 Kudos

Hi,

I have a "jump page", which is dynamically generated with a big list of links that refers to a set of items I would like to index. I'm trying to point TREX to the jump page and have it follow through all the URLs on the page and index all the items. However, it seems to me that, TREX is only indexing the jump page and not following through to any of the links in that page. Is there a way to make that happen?

I have also tried to create a web address with a URL like http://www.mydomain.com/departments/sales/ (with no index.html or any file name), and that seem to be able to crawl a bunch of other files within that directory. What are the difference between the two scenario?

Currently, I'm doing that by creating a Web Address, and have the index point to the web address as a datasource. I've read other threads about creating Web Repository and/or HTTP Systems, will that make a difference?

Please help. Thanks.

Michael

Accepted Solutions (1)

Accepted Solutions (1)

former_member186605
Active Contributor
0 Kudos

Dear Michael,

There’s a restriction regarding hyperlinks. Crawler can follow hyperlinks on web pages only if contained in the source of the HTML-page embedded like <a href=“…“>. Hyperlinks embedded in Javascript or any other scripting language are not followed by the crawler, hence not getting indexed.

Best Regards, Peter

Former Member
0 Kudos

Peter,

my jump page is actually a jsp that writes out all a href tags in plain html. But even so, it's not crawling through, any ideas? Thanks

Michael

Former Member
0 Kudos

Hi Michael,

some things that could be the reason for your problem:

1) Make sure the webpage you are crawling doesn't have a robots.txt which disallows the subpages you want to crawl.

2) Make sure your jump page doesn't forbid a crawler to follow links. Look for a meta tag with the name "robots". Make sure it is set to "index,follow":

<meta name="robots" content="index,follow" />

If the metatag doesn't exist it might help to create it with the above mentioned parameters.

3) Check the crawler configuration in the Knowledge Management settings in your portal. Make sure the crawler you are using has an appropriate maximum depth set (depending on how far you want to follow links, 2 or 3 is a good starting point). Also you might have to check the option "Follow Redirects on Web-Sites", depending on if your website uses redirects or not.

4) I've never tried this with just a website, only with Web Repositories. So it might in fact make a difference. If none of the above works I suggest configuring a Web Repository for your site.

Hope any of this helps,

Esther

Answers (0)