Crawl Collection


The Crawl collection corresponds to one analysis that is available in SiteCrawler. It is a non-timestamped collection, which only contains dimensions.

A subset of the fields available in the Crawl collection is listed here.


  "collections": ["crawl.YYYYMMDD"],

Indexability Fields

Is Indexablecrawl.20210411.indexable.is_indexableBoolean
Non-Indexable Reason is Non-Self Canonical Tagcrawl.20210411.indexable.reason.canonicalBoolean
Non-Indexable Reason is Noindex Statuscrawl.20210411.indexable.reason.noindexBoolean
Non-Indexable Reason is Non-200 HTTP Status Codecrawl.20210411.indexable.reason.http_codeBoolean
Non-Indexable Reason is Bad Content-Typecrawl.20210411.indexable.reason.content_typeBoolean

Crawl Fields

HTTP Status Codecrawl.20210411.http_codeInteger
Content Typecrawl.20210411.content_typeString
Content Byte Sizecrawl.20210411.byte_sizeInt
Delay First Byte Receivedcrawl.20210411.delay_first_byteInteger
Delay Totalcrawl.20210411.delay_last_byteInteger
Date Crawledcrawl.20210411.date_crawledDatetime

Many more fields are available and can be explored in Botify.


All fields in the Crawl collection are Dimensions, except count_urls_crawl, which outputs the number of crawled URLs. This means the dimensions ["segments.pagetype.value", "crawl.20210102.depth"]will receive data for each combination of segment and depth.

home | 1  
products | 2  
products | 3

To get an aggregated value (e.g., the average depth for each segment) instead of all combinations, apply an Aggregation function to the depth so it becomes a metric:

 "dimensions": ["segments.pagetype.value"], 
 "metrics": [
    "function": "avg", 
    "args": ["crawl.20210102.depth"]


home | 1  
products | 2.23

Collections Explorer

Get the full list of fields using the Collections Explorer.