How to love Shanghai original spark program search engine to identify duplicate content
like the above analysis of the situation, duplicate content will affect the spider crawling. At the same time, the search engine for the healthy development of the Internet and the ecological fair quality original site collective interests, reduce the original value by collecting station income, punishes, repeated loading, spam sites. The search engine is proposed such that it is how to identify duplicate content?
duplicate content can occur at different sites, can also occur at the same site, the optimization of CMS system in Shanghai Longfeng non optimized, causing the site has more than one URL address can access the same page, which is repeated, the spider encountered such a situation will determine which page is important. First page from the robots file detection, whether there is a no grab address, if there is a stop on the repeat page address crawl, if not will continue to capture the content in the grab to meet meta label, marked Noindex these pages and not for spider preparations, this spider also will not continue to capture the repeat the spider crawling, reduce the difficulty, even duplicated content, the spider will only capture the value of the version.
webmaster can not determine the version will search spider crawl repeat page, the search engine can not be cleared, different users search spider back pages may be different, the existence of duplicate pages may also have different bias, whether to return the user search page is you most love to return to the page, if you can to bring you the highest traffic conversion rate? These are to reduce weight can make nothing of it, the dispersion of duplicate pages can be included in the robots file, shielding, or add canonical tag transfer weight.
The information content of
love Shanghai search engine in order to rectify the Internet, large scale launched the "love Shanghai original spark program, this program can perform to high strength, the establishment of a special page corresponding to the high quality of the site is invited to join the spark program. We are now facing is filled with a repetition of the "content" and "spam" era, the spark program search engine is how to identify the duplicate content of
search engine to provide high quality content to the user, the user in the search for relevant content, through some filtering mechanism to filter out duplicate content, instead of showing a lot of repeat the same results, if there are a large number of duplicate content in the website, search engine may be filtered to the site of impact process.
site search engine to crawl the content before, there have been crawling page is expected, if the site exists lots of duplicate pages, the spider will one by one to retrieve the page, although filtered in return results, but wasted a spider on the site page crawl budget, reduce the other high quality page crawling opportunity overall repeat the same weight page will disperse the website, this spider to obtain and lead to significant reduction in the page.