SiteCrawler - crawl collection

The crawl collection corresponds to one analysis that is available in SiteCrawler. It is a non-timestamped collection.

We will list here a subset of available fields in the crawl collection.

Identifier

{
  "collections": ["crawl.YYYYMMDD"],
  ...
}

Indexability fields

NameSlugType
Is Indexablecrawl.20210411.indexable.is_indexableBoolean
Non-Indexable Reason is Non-Self Canonical Tagcrawl.20210411.indexable.reason.canonicalBoolean
Non-Indexable Reason is Noindex Statuscrawl.20210411.indexable.reason.noindexBoolean
Non-Indexable Reason is Non-200 HTTP Status Codecrawl.20210411.indexable.reason.http_codeBoolean
Non-Indexable Reason is Bad Content-Typecrawl.20210411.indexable.reason.content_typeBoolean

Crawl fields

NameSlugType
Depthcrawl.20210411.depthInteger
HTTP Status Codecrawl.20210411.http_codeInteger
Content Typecrawl.20210411.content_typeString
Content Byte Sizecrawl.20210411.byte_sizeInt
Delay First Byte Receivedcrawl.20210411.delay_first_byteInteger
Delay Totalcrawl.20210411.delay_last_byteInteger
Date Crawledcrawl.20210411.date_crawledDatetime

Many more fields are available and can be explored in Botify.