Crawl Collection

Overview

The Crawl collection corresponds to one analysis that is available in SiteCrawler. It is a non-timestamped collection, which only contains dimensions.

A subset of the fields available in the Crawl collection is listed here.

Identifier

{
  "collections": ["crawl.YYYYMMDD"],
  ...
}

Indexability Fields

NameSlugType
Is Indexablecrawl.20210411.indexable.is_indexableBoolean
Non-Indexable Reason is Non-Self Canonical Tagcrawl.20210411.indexable.reason.canonicalBoolean
Non-Indexable Reason is Noindex Statuscrawl.20210411.indexable.reason.noindexBoolean
Non-Indexable Reason is Non-200 HTTP Status Codecrawl.20210411.indexable.reason.http_codeBoolean
Non-Indexable Reason is Bad Content-Typecrawl.20210411.indexable.reason.content_typeBoolean

Crawl Fields

NameSlugType
Depthcrawl.20210411.depthInteger
HTTP Status Codecrawl.20210411.http_codeInteger
Content Typecrawl.20210411.content_typeString
Content Byte Sizecrawl.20210411.byte_sizeInt
Delay First Byte Receivedcrawl.20210411.delay_first_byteInteger
Delay Totalcrawl.20210411.delay_last_byteInteger
Date Crawledcrawl.20210411.date_crawledDatetime

Many more fields are available and can be explored in Botify.

Dimensions

All fields in the Crawl collection are Dimensions, except count_urls_crawl, which outputs the number of crawled URLs. This means the dimensions ["segments.pagetype.value", "crawl.20210102.depth"]will receive data for each combination of segment and depth.

home | 1  
products | 2  
products | 3
etc...

To get an aggregated value (e.g., the average depth for each segment) instead of all combinations, apply an Aggregation function to the depth so it becomes a metric:

{
 "dimensions": ["segments.pagetype.value"], 
 "metrics": [
   {
    "function": "avg", 
    "args": ["crawl.20210102.depth"]
   }
 ]
} 

Results:

home | 1  
products | 2.23
etc...

Collections Explorer

Get the full list of fields using the Collections Explorer.