Ahrefs 爬虫

在 Ahrefs，我们运营两大主要网络爬虫——AhrefsBot 和 AhrefsSiteAudit，以支持我们的工具和服务组合。我们的抓取目标是帮助网站所有者提升在线存在，同时尽可能降低对其服务器的负载，并确保抓取行为安全、透明。

内容

Our bots

Verification and IP lists

Benefits for site owners

Policies and commitments

Controlling bots behavior

Our bots

AhrefsBot

用户代理字符串Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://technewsbd.online/robot/)

Robots.txt

User-agent token in robots.txt:
AhrefsBot
Obeys robots.txt: Yes
Obeys crawl delay: Yes

目的为营销情报平台 Ahrefs 和注重隐私的独立搜索引擎 Yep 提供数据库支持。

Ahrefs 网站诊断

电脑端用户代理字符串Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://technewsbd.online/robot/site-audit)

移动用户代理字符串Mozilla/5.0 (Linux; Android 13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.128 Mobile Safari/537.36 (compatible; AhrefsSiteAudit/6.1; +http://technewsbd.online/robot/site-audit)

Robots.txt

User-agent token in robots.txt:
AhrefsSiteAudit
Obeys robots.txt: Yes by default (website owners can request to disobey robots.txt on their sites)
Obeys crawl delay: Yes by default (website owners can request to disobey crawl delay on their sites)

目的为 Ahrefs 的 Site Audit 工具提供支持。Ahrefs 用户可以使用 Site Audit 来分析网站，并发现技术性 SEO 和页面 SEO 问题。

Cloudflare 已验证

AhrefsBot 和 AhrefsSiteAudit 均被知名网络安全与性能公司 Cloudflare 认定为可信的“优质”爬虫。

IndexNow.org

IndexNow 合作伙伴

由 Ahrefs 开发的搜索引擎 Yep，是 IndexNow 协议的官方合作伙伴之一，与其他主流搜索引擎协同工作。我们帮助网站所有者在内容更新时即时通知我们，确保更及时和准确的索引。

Verification and IP lists

IP 地址

我们从公开发布的 IP 范围中抓取数据。您可以获取我们的 IP 地址作为 IP 范围或单个 IP。有关如何将我们的 IP 地址加入白名单的信息，请参阅帮助文章。

反向 DNS

这些 IP 地址主机名的反向 DNS 后缀始终为 ahrefs.com 或 ahrefs.net。

网站状态

您可以查看您的网站在我们爬虫眼中的状态，以及是否允许被抓取：

Benefits for site owners

AhrefsBot 会索引有关网站及其内容的最新准确信息，同时分析它们之间的链接关系。这些数据极具价值，可通过多种方式加以利用：

AhrefsBot powers Yep—an independent, privacy-focused search engine. Being included in Yep’s index helps site owners reach a new audience.
AhrefsBot feeds data into Ahrefs toolset. Website owners can create a free Ahrefs webmaster account and verify domain ownership to unlock site analytics, including access to in-depth backlink data, website performance metrics, and content change monitoring. Ahrefs also offers a suite of free SEO tools that anyone can use without creating an account.
AhrefsSiteAudit powers our Site Audit tool. Site Audit checks websites for technical and on-page issues such as broken links, slow performance, security misconfigurations, and SEO pitfalls. By crawling and rendering pages, we help identify improvements that can boost visibility, loading speed, and overall user experience. Ahrefs also provides the option to run Site Audit for free on verified websites, helping site owners discover and fix technical issues, all without incurring any charges

Policies and commitments

遵循 robots.txt

两款爬虫均严格遵循 robots.txt 文件中的 disallow 和 allow 规则，以及 crawl-delay 指令。只有经过验证的网站所有者才能授权 AhrefsSiteAudit 爬虫忽略其网站上的 robots.txt 规则，以便检查通常禁止抓取的网站部分是否存在问题。

在请求 HTML 页面时，我们严格遵守爬取延迟设置，确保不超出规定的速率限制。但在渲染 JavaScript 时则无法遵循此规则。当我们的爬虫渲染页面时，可能会同时请求多个资源（如图片、脚本、样式表），这可能导致服务器日志中显示的请求频率超出抓取延迟设置的允许范围。这种行为模拟了真实用户的访问体验，因为现代网页通常需要同时加载多个资源才能实现正常渲染和运行。

缓存资源

在抓取过程中，我们会缓存频繁请求的文件（如图片、CSS、JS）以减少重复抓取，从而降低带宽消耗并减轻服务器负载。

负载管理

如果遇到非 200 状态代码（尤其是 4xx 或 5xx 错误），我们会自动降低该站点的抓取速度，从而确保对可能出现故障或高服务器负载的网站施加的压力最小化。

透明的运作方式

我们深知主机服务商、CDN 服务商和 CMS 平台可能希望管控爬虫与其客户网站之间的交互。我们公开的 IP 地址和用户代理字符串可让您或您的服务提供商快速验证 Ahrefs 的合法流量。我们始终致力于保持抓取行为的透明性，以建立信任并促进合作。如有任何疑问，请发送邮件至 [email protected]，我们将竭诚为您提供帮助。

Controlling bots behavior

我们提供清晰且用户友好的选项，方便您管理我们的爬虫：

通过 Robots.txt

要更改 AhrefsBot 或 AhrefsSiteAudit 的访问频率，只需在 robots.txt 文件中设置允许的最小访问间隔时间即可：

User-agent: AhrefsBotCrawl-Delay: [value]

（其中 Crawl-Delay 值是以秒为单位的时间。）

如果您想阻止 AhrefsBot 或 AhrefsSiteAudit 访问您的网站或特定板块，请使用 Disallow 指令：

User-agent: AhrefsBotDisallow: /path-to-disallow/

请注意，AhrefsBot 可能需要一些时间来获取您 robots.txt 文件中的更改。这一过程将在下一次计划抓取之前完成。已验证的网站所有者可以允许 AhrefsSiteAudit 爬虫程序无视其网站上的 robots.txt 规则，以便检查通常禁止抓取的网站部分是否存在问题。

此外，如果您的 robots.txt 文件包含错误，我们的爬虫将无法识别您的指令，并会继续按照之前的方式抓取您的网站。更多关于 robots.txt 的信息，请访问 www.robotstxt.org。

返回非 200 状态码以降低抓取速度

您可以临时降低 AhrefsBot 的抓取速度。当网站出现故障或进行基础设施变更需要减轻服务器负载时，这一功能尤为实用。您可以通过在故障或维护期间返回 4xx 或 5xx HTTP 状态码来临时降低抓取速率。我们的爬虫程序会检测到这些错误代码并自动退避。

在 Site Audit 中调整速度设置

AhrefsSiteAudit 爬虫通过将抓取频率限制为每分钟最多 30 个 URL，来避免对网站服务器造成过大负载。如果您是网站所有者，并希望更快发现网站问题，可以自行提高对自己网站的抓取速度。为此，您需要在 Site Audit 工具中完成所有权验证。

联系我们

如果您对我们抓取的频率有任何疑虑，或发现可疑流量需要确认，请通过 [email protected] 联系我们。我们将为您澄清并解决任何问题。