Overview

The Crawl collection corresponds to one analysis that is available in SiteCrawler. It is a non-timestamped collection, which only contains dimensions.

A subset of the fields available in the Crawl collection is listed here.

Identifier

{
  "collections": ["crawl.YYYYMMDD"],
  ...
}

Indexability Fields

Name	Slug	Type
Is Indexable	`crawl.20210411.indexable.is_indexable`	Boolean
Non-Indexable Reason is Non-Self Canonical Tag	`crawl.20210411.indexable.reason.canonical`	Boolean
Non-Indexable Reason is Noindex Status	`crawl.20210411.indexable.reason.noindex`	Boolean
Non-Indexable Reason is Non-200 HTTP Status Code	`crawl.20210411.indexable.reason.http_code`	Boolean
Non-Indexable Reason is Bad Content-Type	`crawl.20210411.indexable.reason.content_type`	Boolean

Crawl Fields

Name	Slug	Type
Depth	`crawl.20210411.depth`	Integer
HTTP Status Code	`crawl.20210411.http_code`	Integer
Content Type	`crawl.20210411.content_type`	String
Content Byte Size	`crawl.20210411.byte_size`	Int
Delay First Byte Received	`crawl.20210411.delay_first_byte`	Integer
Delay Total	`crawl.20210411.delay_last_byte`	Integer
Date Crawled	`crawl.20210411.date_crawled`	Datetime

Many more fields are available and can be explored in Botify.

Dimensions

All fields in the Crawl collection are Dimensions, except count_urls_crawl, which outputs the number of crawled URLs. This means the dimensions ["segments.pagetype.value", "crawl.20210102.depth"]will receive data for each combination of segment and depth.

home | 1  
products | 2  
products | 3
etc...

To get an aggregated value (e.g., the average depth for each segment) instead of all combinations, apply an Aggregation function to the depth so it becomes a metric:

{
 "dimensions": ["segments.pagetype.value"], 
 "metrics": [
   {
    "function": "avg", 
    "args": ["crawl.20210102.depth"]
   }
 ]
}

Results:

home | 1  
products | 2.23
etc...

Collections Explorer

Get the full list of fields using the Collections Explorer.