The real reason can not be fully included in the secret website
Disallow: /news / news path does not allow all search engines to crawl under the
: third pages have not been visited, "
: the first is robots.txt file screen. Found in the analysis of site log spider every day in the web pages, but the site is still not fully included, then it is necessary to detect the site of robots file, the webmaster know spiders to crawl sites, will first search the web presence of robots files, check whether the site already exists does not need to crawl the page. To see whether the document from a certain part of the site, because many owners will not properly write robots files, causing the site cannot be fully included. For many novice webmaster do not know how to write the file, can use love Shanghai Webmaster Tools robots file function, can detect your file written correctly, or can help you generate a file, file path can you fill in the shield. As the following example:
second: robots meta tag banned site in the process of diagnosis, found the site of the actual collection rate is very low, some website column page can be included, but some column page updated the original article, also not be included, found after checking the website code, the page uses the Noindex tag to tell the spider do not allow the index page, column page is not clearly the protection of this code is included, the updated content even if the quality is not high and then grab. At the same time, the nofollow tag tells spider the links on this page does not transfer the weight, if the page on the website link with nofollow, it is obvious to tell search engines this page does not have any value. If you encounter the station can not be fully included, check the meta label, if there is an error tag.
files on a web site, the spider will not crawl the site on the news path of the content, the website in the news directory update will not be included, update the website more articles, the spider daily access log to capture, but the content is still not included for giving. But for the website news content is very important, so the wrong robots file and is behind the site can not be fully included.
site on the line after the first thing to do is to submit the URL to the search engine, search engine receives the URL request, and will arrange for the spiders to crawl the web, but unfortunately not always fully spider web pages in the website, the actual collection rate is very low, what is the reason causing the site can not be fully included. Is the website structure, website, website optimization problem or weight problem? It give you the real reason Guangzhou part-time secret is not fully included.
If you add