一百度篇
什么是baiduspider?
baiduspider是百度搜索引擎的一个自动程序。它的作用是访问互联网上的html网页,建立索引数据库,使用户能在百度搜索引擎中搜索到您网站的网页。
baiduspider对一个网站服务器造成的访问压力如何?
baiduspider会自动根据服务器的负载能力调节访问密度。在连续访问一段时间后,baiduspider会暂停一会,以防止增大服务器的访问压力。所以在一般情况下,baiduspider对您网站的服务器不会造成过大压力。
为什么baiduspider不停的抓取我的网站?
对于您网站上新产生的或者持续更新的页面,baiduspider会持续抓取。此外,您也可以检查网站访问日志中baiduspider的访问是否正常,以防止有人恶意冒充baiduspider来频繁抓取您的网站。如果您发现baiduspider非正常抓取您的网站,请反馈至webmaster@baidu.com,并请尽量给出baiduspider对贵站的访问日志,以便于我们跟踪处理。
我不想我的网站被baiduspider访问,我该怎么做?
baiduspider遵守互联网robots协议。您可以利用robots.txt文件完全禁止baiduspider访问您的网站,或者禁止baiduspider访问您网站上的部分文件。注意:禁止baiduspider访问您的网站,将使您的网站上的网页,在百度搜索引擎以及所有百度提供搜索引擎服务的搜索引擎中无法被搜索到。
关于robots.txt的写作方法,请参看我们的介绍:robots.txt写作方法
为什么我的网站已经加了robots.txt,还能在百度搜索出来?
因为搜索引擎索引数据库的更新需要时间。虽然baiduspider已经停止访问您网站上的网页,但百度搜索引擎数据库中已经建立的网页索引信息,可能需要二至四周才会清除。另外也请检查您的robots配置是否正确。
我希望我的网站内容被百度索引但不被保存快照,我该怎么做?
baiduspider遵守互联网meta robots协议。您可以利用网页meta的设置,使百度显示只对该网页建索引,但并不在搜索结果中显示该网页的快照。
和robots的更新一样,因为搜索引擎索引数据库的更新需要时间,所以虽然您已经在网页中通过meta禁止了百度在搜索结果中显示该网页的快照,但百度搜索引擎数据库中如果已经建立了网页索引信息,可能需要二至四周才会在线上生效。
百度蜘蛛在robots.txt中的名字是什么?
“Baiduspider” 首字母B大写,其余为小写。
baiduspider多长时间之后会重新抓取我的网页?
百度搜索引擎每周更新,网页视重要性有不同的更新率,频率在几天至一月之间,baiduspider会重新访问和更新一个网页。
baiduspider抓取造成的带宽堵塞?
baiduspider的正常抓取并不会造成您网站的带宽堵塞,造成此现象可能是由于有人冒充baidu的spider恶意抓取。如果发现有名为Baiduspider的agent抓取并且造成带宽堵塞,请尽快和我们联系。可以将信息反馈至webmaster@baidu.com,如果能够提供您网站该时段的访问日志将更加有利于我们的分析。
二谷歌篇
1什么是沙盒
所谓的“沙盒效应(Sandbox Effect)”对于许多网站管理员来说不像游乐场,倒更像流沙。
由最主要的搜索引擎Google所决定的搜索引擎结果页面(搜索结果)中,新发布的网站排名越来越低。这一现象被称为“沙盒效应”,而它着实让很多站长头疼。当一个新的网站被Google列入索引之后,它通常会获得被许多观察员认为的对“新网站”的奖励。这个崭新的网站将在搜索结果列表中飞速上升至顶部,然而却是昙花一现,随后,排名就会不断下降。
在最重要的关键字搜索排名前列的数日风光之后,这些网站就会被雪藏于Google搜索结果底部,好像根本不存在一样。即使该网站可能具有很高的Google网页级别(PR值),拥有很多强有力并且主题相关的引入链接和丰富的内容,它还是会遭遇到令人沮丧的“沙盒效应(Sandbox)”的影响。
当网页在Google的沙盒中深埋的时候,对于同一个关键词,它却可能在Yahoo(雅虎)和MSN的搜索结果中获得很高的排名。看来,“沙盒效应”是Google的独特事件。
你需要考虑怎样才能从沙盒效应中摆脱出来。当抑制因素排除时,你在沙盒惩罚期间所做的工作可以使你的网站重新出现在搜索结果中。
沙盒效应作为Google使用的搜索排名阻尼过滤器(search ranking damping filter),是用于网站首次发布后的头两个月至4个月内获得“新站奖励(fresh site bonus)”的网站。这个给予新网站的奖励是指短时间内在搜索结果中获得非常高的排名,因为Google偏爱新内容。在沙盒期间,新站应该不断完善各方面SEO,沙盒效应过后,一般会有不错的排名。
由于原来的新内容变得稍微成旧,沙盒过滤器开始起作用。这就是沙盒效应。一个网站陷入沙盒的平均时间长度大约为90天,但是很少在沙盒中持续呆4个月。
大多数网站按照关键字的类型分享阻尼下降效应(damping down effect)。沙盒过滤器适用于所有网站,而不管它们是否有很多引入链接,即使有相关性很强的链接也没什么区别。内容丰富的网站也同样会陷入流沙。沙盒效应必然是Google的算法之一,因为它最近很普遍。
沙盒效应的意图是什么?
很多观察者都认为,沙盒过滤器的目的是劝阻不择手段的WEB站点管理员不要使用违反Google站点管理员指南的做法。Google试图瓦解一些手法,如使用垃圾站点建立初期的流量,购买过期的域名以获得其之前存在的Google网页等级作为跳跃的起点。
除非建立更长久的链接,否则短期的链接租用和放置对一个新网站来说并没有什么好处。也存在这种可能性:即Google在索引一个网站后的最初几个月中,并不会给予其全部的网页等级(PageRank)。阻尼效应(dampening effect)造成的缺乏信用的网页等级(PageRank)会降低过期域名引入链接的价值。
从这种情况可以推断出Google是否试图在阻止普遍的购买过期域名的行为。例如,Google工具栏可能显示为PR7,但是算法可能赋予其网站搜索排名的PR值为零。当然,这种情况下也有可能导致相当高的Google排名。
Google可能打击的另一个方法是垃圾站点。如果一个纯粹的垃圾站点制造者(spammer)在网站运行后的头几个月中不能获得好的排名,他们可能会关闭他们的垃圾站点。打击spammer是Google的长期目标。
然而,Google方面声称的任何良好的意图都间接打击了那些没有违反任何规则的网站。许多WEB站点管理员,尤其是那些完全不知道SEO整体情况的站长,通常对于所发生的事情非常困惑。例如,他们在Yahoo 和MSN的搜索结果中排名很好,但却不知道为什么不在Google的索引目录中。还有很多人错误地认为他们无意中触发了Google的惩罚。
Google打击的另一个行为是购买链接获得初始的网页等级(PageRank)。Google公司的人可能认为一个网站应该随着时间的过去而自然获得链接。他们认为购买的链接不是网站自然发展的表现。但是,只有少数几个自然链接的网站,也遭遇了同样的沙盒阻尼效应(dampening effect)。
这种情况也完全有可能发生:即一个网站没有被放在沙盒中,但是它的链接正在被监测。算法可能会考虑其链接的时间,它们的链接站点,Internet服务商拥有的链接范围,以及通常情况下整个链接的多样性。
如果你的网站正好进入沙盒,最好不要惊慌。Google并没有惩罚你的网站。你也知道你的网站已经被Google索引。相反,这只是一个正常的现象,是Google算法有些令人困惑的和苦恼的地方。如果网站由于“新站奖励效果”而在搜索结果中获得很高的排名,则很有可能会尝到苦果。
2什么是Google Dance?
Google dance是指Google搜索引擎数据库每月一次的大规模升级。
在升级期间,新的网页被加入,无效网页被删除,对收录网站进行全面深度检索,也可能在这期间调整算法。Goolge搜索结果显示出剧烈的排名波动,同时被索引网站的外部链接也获得更新。每个季度更新一次的网页级别(Page Rank)也发生在Google dance期间。Dance一般持续几天时间,Dance结束后,Google搜索结果和网站外部链接数量趋于稳定,直至下一个周期的Google dance到来。
Google Dance 是Google定期更新它的索引的活动,给人感觉就像是跳舞一样。在这个Dance的过程中,Google所储存的索引都被重新更新,网站的排名会发生剧烈变化,有的网站在Google上的排名一夜之间消失,有的网站则名列首位。Google Dance通常在月末的那周开始,新结果在月初几天可以看到,大概是每36天一次或者一年10次。
Google Dance相关背景
佛罗里达风暴和奥斯汀风暴
2003年11月上旬Google开始的对排名算法的剧烈更新。这个更新,犹如飓风,一夜之间让千万个网站从搜索引擎中消失或者从前10名降级到100页以后,使很多网站在即将到来的圣诞节购物黄金季节失去大量的客源。因为,Google的这次更新是以位于佛罗里达的Google数据中心为主的,因此称为“佛罗里达风暴”。
佛罗里达风暴之后不久,也就是2004年的一月,Google又进行了一次巨大的算法更新。因为这次更新是从位于得克萨斯州的奥斯汀(Austin)的Google数据中心开始的,所以这次更新被冠以“奥斯汀风暴”。奥斯汀风暴被看作是佛罗里达风暴的余震。
Google dance存在的意义
Google dance 是Google完善自己的算法、反对垃圾泛滥的努力。Google dance出现之后,使许多SEO从业者开始反思如何正确地使用技术来优化网站。只要你的网站没有作弊或者使用什么不好的技术,那就不怕Google Dance,所以,即使有一天你突然发现你的网站排名突然后退很多或者在Google上找不到了,也不要太担心,因为可能是Google在翩翩起舞。
What is baiduspider?
baiduspider search engine Baidu is an automatic procedure. Its role is to visit the html page on the Internet, the index database so that users can search engine Baidu search to your site.
baiduspider Web server on a visit to the pressure caused by what?
baiduspider automatically based on server load capacity of regulating the density of the visit. In a period of time in a row after the visit, baiduspider will be suspended for a while in order to prevent increased pressure on the server's visit. Therefore, under normal circumstances, baiduspider on your Web site's server would not cause too much pressure.
Why baiduspider of non-stop crawling my site?
You have a new Web site or the continuous updating of the page, baiduspider will continue to crawl. In addition, you can visit the site inspection log baiduspider visit is normal in order to prevent malicious baiduspider pretending to frequent crawling your site. If you find abnormal baiduspider crawled your site, please feedback to webmaster@baidu.com, and please try to give baiduspider Guizhan visit to log on in order to deal with our track.
I do not want my site to be visited baiduspider, how do I do?
Baiduspider Internet robots to comply with the agreement. You can use the robots.txt file baiduspider a total ban on visits your site, or prohibit the Web site you visit baiduspider part of the document. Note: baiduspider to prohibit access to your Web site will make your web site, Baidu search engine Baidu, as well as all the search engines to provide search engine services can not be searched.
Writing on the robots.txt method, see our presentation: robots.txt writing method
Why my site has added a robots.txt, but also in Baidu search out?
Because the search engine's index database update will take time. Although baiduspider has stopped you visit the web site, database search engine Baidu has been established in the index page of information, it may be necessary to clear only 2-4 weeks. Please also check your robots configuration is correct.
I hope that my site has been indexed, but Baidu is not to preserve a snapshot, how do I do?
baiduspider Internet meta robots to comply with the agreement. You can set up a website's meta, so that shows that Baidu is only to build the index page, but this does not in the search results page that shows a snapshot.
And update the robots, because the search engine index will take time to update the database, so although you have a web page through a meta-Ban Baidu in the search results page that shows a snapshot, but the search engine Baidu database has been established if the page Index information, may need to be 2-4 weeks before the entry into force of the on-line.
Baidu Spider in the robots.txt in the name of what?
"Baiduspider" the first capital letter B, for the rest of the lower case.
baiduspider long after my re-crawled the page?
Baidu search engine is updated weekly, depending on the importance of web pages have different update rate, the frequency of a few days in January to between, baiduspider will re-visit and update a Web page.
Baiduspider crawling caused by bandwidth congestion?
baiduspider normal crawling your site will not cause bandwidth congestion, caused by this phenomenon may be due to people posing as the baidu malicious spider crawl. If discrimination is found, the agent called Baiduspider crawl bandwidth and causing blockage, please contact us as soon as possible. Feedback can to webmaster@baidu.com, if you can provide the time to visit the Web site log will be more conducive to our analysis.
Google Part II
What is a sandbox
The so-called "sandbox effect (Sandbox Effect)" For many of the Web site administrator for the playground is not, it is more like quicksand.
By the major search engines Google decided by the search engine results page (search results), the new release of more low-ranked Web site. This phenomenon, known as the "sandbox effect", and it really made a lot of headaches station. When a new Web site to be included in the Google index, it will normally be made by many observers believe that the "new Web site" incentives. The new site will be in the search results list to the top of the rapid rise, but a flash in the pan, then place it will continue to decline.
In the most important keyword search ranking in the forefront of the scenery for a few days later, these sites will be frozen at the bottom of the Google search results, as if the same non-existent. Even if the site may have a very high Google page-level (PR value), with a lot of strong themes and related links and the introduction of a wealth of content, or it will face frustrating, "Sandbox Effect (Sandbox)" the impact of .
When the Google page in the deeply buried in the sand box, for the same keywords, it may be Yahoo (Yahoo) and MSN's search results to obtain a high ranking. It appears that the "sandbox effect" Google is a unique event.
You need to consider how to effect from the sandbox out of it. When constraints ruled out, you sand in the penalty box during the work done by the Web site allows you to re-appear in search results.
Sandbox effect as using the Google search ranking damping filter (search ranking damping filter), is the site for the first time after the first two months to four months to obtain the "new incentives station (fresh site bonus)" website . The award given to the new site is the search results in a short period of time was very high ranking, as Google favor of the new content. In the Sandbox, a new station should continue to improve all aspects of SEO, sandbox effect after the general will have a good ranking.
As the new original content has become a little old-cheng, the Sandbox filter came into play. This is the sandbox effect. Web site into a sandbox, the average length of about 90 days, but rarely in the sand box continued to stay in 4 months.
Most of the site in accordance with the type of keywords share decline damping effect (damping down effect). Sandbox filter is applicable to all sites, regardless of whether they have introduced a lot of links, even if there is a strong correlation of the link is no difference. Content-rich Web site will also fall into quicksand. Sandbox effect must be one of Google's algorithm because it is very common recently.
Sandbox Effect of intent?
Many observers believe that the Sandbox filter is designed to discourage unscrupulous WEB site administrators do not use the Google site in violation of the administrator's guide. Google to dismantle a number of methods, such as the use of garbage site in the early establishment of the flow, the purchase of expired domain names for its existence prior to the Google page level as a starting point for Jumping.
Unless a more permanent link, or link short-term lease on the place and a new Web site is no good. There is this possibility: that is, in Google's index a Web site after the first few months, and will not give its full level of the page (PageRank). Damping effect (dampening effect) caused by lack of credit rating website (PageRank) will lower the expired domain names linked to the introduction of value.
This situation can be deduced from a Google is trying to stop the widespread purchase of expired domain names. For example, Google Toolbar may appear as PR7, but the algorithm may be given to its Web site's search ranking PR value is zero. Of course, these circumstances may lead to a very high Google ranking.
Google may be another way to crack down on spam sites. If a pure garbage site manufacturers (spammer) site in the first few months after the operation can not be a good place, they may refuse to close the site. Google is a spammer to combat long-term goal.
However, Google has claimed that any good intentions have an indirect blow to those who do not violate any rules of the site. Many WEB site administrators, especially those who do not have any idea of the overall SEO master, usually for what is happening is very confusing. For example, MSN and Yahoo in their search results in a good position, but do not know why Google is not the index directory. There are many people mistakenly believe that they accidentally triggered a Google penalty.
Google acts against another is a link to buy access to the website of the initial level (PageRank). Google's people may think that a site should be more time that passes and access to natural link. They believe the link is not to buy the natural development of the site. However, only a few natural links to the Web site also experienced the same sandbox damping effect (dampening effect).
This also has the potential: a Web site that has not been placed in the sandbox, but its links are to be monitoring. Algorithm may be time to consider the link, link their sites, Internet service provider owned by the scope of the link, and usually link the whole diversity.
If you just enter the site of the Sandbox, it is best not to panic. Google does not penalize your site. You know that your website has been Google index. On the contrary, it is a normal phenomenon, Google algorithm is some confusion and distress of the place. If the Web site because of "the effect of the new station incentives" in the search results to obtain a high ranking, it is likely to have tasted the bitter fruit.
2 What is the Google Dance?
Google dance is the Google search engine database once a month large-scale upgrades.
In the upgrade, adding a new page to be invalid page has been deleted, the site includes a comprehensive in-depth search may also adjust the algorithm in this period. The results show Goolge search drastic fluctuations in the rankings, the index was at the same time links to external sites are updated. Quarterly update of a page-level (Page Rank) also took place during the Google dance. Dance in general continued a few days, Dance after the end, Google search results and the number of links to external sites stabilized until the next cycle of Google dance to come.
Google is the Google Dance on a regular basis to update its index of activity, people feel like dancing the same. Dance in this process, Google stored in the index have been re-updated ranking of the site will be dramatic changes took place, and some sites in Google's ranking disappear overnight, some sites are ranked first. Google Dance usually at the end of the beginning of the week, the results of a new beginning in a few days can be seen about once every 36 days or 10 times a year.
Google Dance background
Florida storms and storm Austin
In early November 2003 to the beginning of Google's ranking algorithm update intense. This update, like hurricanes, overnight to allow thousands of sites from search engines or disappeared in the past 10 downgraded to 100, many sites in the upcoming Christmas shopping season to lose a lot of gold source. Because, Google's update is based on the Florida-based Google data center, so called "Florida storm."
Florida shortly after the storm, that is, in 2004 the January, Google has had a huge update algorithm. This is because the update from Texas at Austin (Austin) to start the Google data center, so this update has been known as "Storm in Austin." Austin, Florida, the storm was seen as the aftershocks of the crisis.
Google dance the meaning of existence
Google dance is the perfect Google's own algorithm, against the proliferation of spam. Google dance after the emergence of so many SEO practitioners began to reflect on how to use technology to optimize the site. As long as your site does not cheat or anything wrong with the use of the technology, it would not afraid of Google Dance, so that even if one day you suddenly find your site's ranking suddenly back in a lot or can not find on Google, do not worry too much because it is possible Google is in the dancing.

订阅我的BLOG(RSS)