SiteCrawler - crawl collection

The crawl collection corresponds to one analysis that is available in SiteCrawler. It is a non-timestamped collection.

We will list here a subset of available fields in the crawl collection.

Identifier

{
  "collections": ["crawl.YYYYMMDD"],
  ...
}

Indexability fields

Name

Slug

Type

Is Indexable

crawl.20210411.indexable.is_indexable

Boolean

Non-Indexable Reason is Non-Self Canonical Tag

crawl.20210411.indexable.reason.canonical

Boolean

Non-Indexable Reason is Noindex Status

crawl.20210411.indexable.reason.noindex

Boolean

Non-Indexable Reason is Non-200 HTTP Status Code

crawl.20210411.indexable.reason.http_code

Boolean

Non-Indexable Reason is Bad Content-Type

crawl.20210411.indexable.reason.content_type

Boolean

Crawl fields

Name

Slug

Type

Depth

crawl.20210411.depth

Integer

HTTP Status Code

crawl.20210411.http_code

Integer

Content Type

crawl.20210411.content_type

String

Content Byte Size

crawl.20210411.byte_size

Int

Delay First Byte Received

crawl.20210411.delay_first_byte

Integer

Delay Total

crawl.20210411.delay_last_byte

Integer

Date Crawled

crawl.20210411.date_crawled

Datetime

Many more fields are available and can be explored in Botify.


Did this page help you?