The easy solution to this may be to use the 'ignore querystrings' setting. This is likely with forums / discussion boards. The most likely is that there's some kind of session id or tracking id in the querystring, making every url appear unique, including repeated visits to the same page. There are a few reasons why Scrutiny might be in a loop. Scrutiny appears to be crawling many more pages than exist / scanning pages without getting closer to completion The only sensible way to stop those codes if you really don't want them in your results is to set up a rule to ignore or not check them. The 999 is specific to LinkedIn as far as we're aware and they seem to be quite successful at blocking automated checkers and bots. If your server is simply being slow to respond, you can increase the timeout. It will reduce the number of threads if appropriate for your specified maximum requests. Scrutiny will calculate things according to the number of threads you've set (and using a few threads will help to keep things running smoothly). You don't need to do any maths it's not 'per thread'. There's now a control above the threads slider allowing you to specify a maximum number of requests per minute. Since version 8, Scrutiny handles this much more elegantly. ![]() The solution used to involved introducing a delay. Some servers will respond to many simultaneous requests, but some will have trouble coping, or may deliberately stop responding if being bombarded from the same IP. Pages time out / the web server stops responding / 509 / 429 / 999 status codes It shows many things, including a browser window loaded with the page that Scrutiny received, the html code itself, and details of the request / response. (This tool is available from the menu whether or not the crawl was successful). If you declined that offer or didn't see it, you can still access the tool from the Tools menu, 'Detailed analysis of starting url. It may anticipate the problem and offer you the diagnostic window after the attempted crawl. Scrutiny now has a tool to help diagnose this failure to get off the ground. If that doesn't work, switch to one of the 'real' browser user-agent strings, ie Safari or Firefox. The first thing to try is to switch the user-agent string to Googlebot (This is the first item in Advanced settings, you should be able to select googlebot from the drop-down list). These options are under the settings and options for your site, under the Advanced tab. If you don't see your web page as expected, then your website is requiring one or both of these things to be enabled. Crawl finishes with only one link reportedĪ quick test - switch off javascript and cookies in your browser, then try to reload your page. ![]() ![]() If your scan or results didn't turn out as expected, see whether the answer is here.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |