Dimensions

Introduction

Dimensions are the fields on which you are aggregating your metrics.

📘

The result of a BQL query will contain one data row for each combination of dimensions.

Examples:

  • What is the average loading time on my entire website?
    "dimensions": []
    When aggregating all available data into one value, you don't specify any dimensions.

  • What is the number of pages for each segment?
    "dimensions": ["segments.pagetype.value"]
    If you have a segment named pagetype, you will retrieve the number of URLs for each segment of your website.

  • How many URLs do I have in each segment and for each depth?
    "dimensions": ["segments.pagetype.value", "crawl.20210102.depth"]
    You will receive data for each combination of segment and depth.

Generally, the higher the number of dimensions, the higher will be the number of returned rows.

Technically, dimensions can be seen as the GROUP BY clause of a SQL statement.

We will cover the following topics in this section:

  • How field prefixes work in BQL
  • Global fields
  • Functions for dimensions
  • Range dimensions

Field prefixes

Any field represent a type of data available in Botify's unified data model. Each field has a slug, which is a unique identifier for this field.
All fields, and not just dimensions but all fields, are prefixed following three rules.

  1. Timestamped collections

Timestamped collection field prefix: collection_id.period_N.field_slug.
Composed of three parts:

  • collection_id, the identifier from the timestamped collection (e.g. search_console)
  • N, the 0-based index of the period (e.g. period_0 for the first period that is defined)
  • field_slug, the field identifier

Example: search_console.period_0.avg_position

  1. Non-timestamped collections

Non-timestamped collection field prefix: collection_id.field_slug.
Composed of two parts:

  • collection_id, the identifier from the non-timestamped collection (e.g. crawl.20210102)
  • field_slug, the field identifier

Example: crawl.20210102.depth

  1. Global fields

Global fields don't have prefixes: field_slug.

Global fields

A global field represents a field that will be semantically the same field across collections.

If a field is not global, it means that it is tied to a specific collection. A global field however, means that the same field defined on multiple collections can be displayed as the same field, because it represents the same data.

The most obvious example is the URL. If Botify crawled https://example.com/a (crawl collection) and found that https://example.com/a received some clicks in the Google Search Console (search console collection), then we are referring to the same URL.

Among the main global fields:

  • URL and URL related fields (protocol, host, path, query string, ...)
  • Segmentation
  • Keyword
  • Country
  • Device
  • Date (which is a bit particular because the date field is aliased by it's period: period_N.date)

Function dimensions

BQL defines a lot of functions that can be applied to any field, and therefore also to dimensions.
The syntax to use a function is the following, with an example of grouping by HTTP code family:

{
  ...
  "query": {
    "dimensions": [
      {
        "function": "http_code_family",
        "args": [
          "crawl.20210102.http_code"
        ]
      }
    ],
    ...
  }
}

Which will, instead of grouping on the distinct HTTP codes, on the HTTP families 2xx, 3xx, ...
More concisely, a function call is defined as follows:

{
  "function": "FUNCTION_NAME",
  "args": [
    LIST_OF_ARGUMENTS
  ]
}

A function has a name and takes arguments. Each argument can also be another function and nest these operations.
You'll find a complete list of available functions in the Functions section.

Range dimensions

Fields that can be used as dimensions can be divided into two categories: discrete and continuous values.

Discrete values can be queried by specifying the field name as dimension. For example, the depth field from the crawl collection has a finite number of different values. Same goes for the URL itself, the segmentation, etc.

{
  ...,
  "query": {
    "dimensions": [
      "crawl.20210102.depth"
    ],
    ...
  }
}

Continuous values, often metrics that we want to use as dimensions, will have a huge amount of different values. It is forbidden to query a field with continuous values as a distinct dimension. Therefore, one can define ranges to query those fields as dimensions. An example with the average position from the search console collection, which is a floating-point number representing the position on the Search Engine Result Page (SERP).

{
  ...,
  "query": {
    "dimensions": [
      {
        "function": "ranges",
        "args": [
          "search_console.period_0.avg_position",
          [
            {
              "to": 2
            },
            {
              "from": 2,
              "to": 5
            },
            {
              "from": 5,
              "to": 8
            },
            {
              "from": 8
            }
          ]
        ]
      }
    ],
    ...
  }
}

As one might notice, we use a function named ranges that takes two arguments: the field and an array of boundaries.
A range boundary must define one of the two from and to, or both.

  • from represents the lower boundary, which is included
  • to represents the upper boundary, which is excluded

When one boundary is missing, it is an open end.

As for any function, the arguments can be another function and nested. In the case above, the field used for ranges could actually be a value computed thanks to other BQL functions.


What’s Next

Learn more about building a BQL query