Functions

Available functions

Functions can be used with any field or function, as long as the expected types matches, which means that any args element can also be another object defining a function.

Arithmetic functions

Add

Can contain any number of arguments and will be added together.

{
  "function": "add",
  "args": ["field_slug", 3, 5]
}

Subtract

Can contain any number of arguments and will be subtracted from each other.

{
  "function": "sub",
  "args": ["field_slug", 4]
}

Multiply

Can contain any number of arguments and will multiply all arguments.

{
  "function": "mul",
  "args": ["field_slug", 3]
}

Divide

Must contain two arguments that will be divided.

{
  "function": "div",
  "args": ["field_slug", 2]
}

Date functions

Date

Take one datetime argument as input and truncates the time part to only return the date.

{
  "function": "date",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-09-22".

Year

Takes one date or datetime as argument and returns the year as an integer.

{
  "function": "year",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be 2020.

Month

Takes one date or datetime as argument and returns the month as an integer.

{
  "function": "month",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be 9.

Year-month

Takes one date or datetime as argument and returns a year-month string.

{
  "function": "year_month",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-09".

First day of week

Takes one date or datetime as argument and returns a date string of the first of day of the corresponding week.
Can be used as dimension to aggregate data by week.

{
  "function": "first_day_of_week",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-09-21".

First day of month

Takes one date or datetime as argument and returns a date string of the first of day of the corresponding month.
Can be used as dimension to aggregate data by month.

{
  "function": "first_day_of_month",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-09-01".

Year and Week Number starting Monday

Takes one date or datetime as argument and returns a year-week_number string, The week number is follows the ISO week number definition.

{
  "function": "year_week_number_starting_monday",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-39".

Year and Week Number starting Sunday

Takes one date or datetime as argument and returns a year-week_number string, The week number is follows the ISO week number definition but transposed to start on Sunday.

{
  "function": "year_week_number_starting_sunday",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-39".

Truncate date at hour

Takes one date or datetime as argument and returns a datetime string where we truncate until the hours.
Useful to aggregate metrics by a datetime dimension hourly.

{
  "function": "truncate_date_at_hour",
  "args": ["date_crawled"]
}

If date_crawled is "2020-09-22 08:07:00Z", the result of this function will be "2020-09-22 08:00:00Z".

Aggregation functions

To use aggregation functions, see Metrics how to use them in a query.

Sum

Takes one argument and will add all aggregated entries.

{
  "function": "sum",
  "args": ["field_slug"]
}

Average

Takes one argument and will average all aggregated entries.

{
  "function": "avg",
  "args": ["field_slug"]
}

Weighted Average

Takes two arguments and will do a weighted average all aggregated entries, depending on the corresponding weight field.

{
  "function": "weighted_avg",
  "args": ["field_slug", "weight_field_slug"]
}

Min

Takes one argument and will return the smallest value among the aggregated entries.

{
  "function": "min",
  "args": ["field_slug"]
}

Max

Takes one argument and will return the smallest value among the aggregated entries.

{
  "function": "max",
  "args": ["field_slug"]
}

Count

Takes one argument and returns the number of aggregated entries.

{
  "function": "count",
  "args": ["field_slug"]
}

Count Distinct

Takes one argument and returns the number of different aggregated entries.

{
  "function": "count_distinct",
  "args": ["field_slug"]
}

Approximated Count Distinct

Takes one argument and returns the approximate number of different aggregated entries.
If the query takes some time, using an approximate can speed up the query.

{
  "function": "count_distinct_approx",
  "args": ["field_slug"]
}

Count True

Takes one argument and returns the number of true aggregated entries.

{
  "function": "count_true",
  "args": ["field_slug"]
}

Count False

Takes one argument and returns the number of false aggregated entries.

{
  "function": "count_false",
  "args": ["field_slug"]
}

Count Null

Takes one argument and returns the number of null aggregated entries.

{
  "function": "count_null",
  "args": ["field_slug"]
}

Count Equals

Takes two arguments and returns the number of aggregated entries where both values are equal.

{
  "function": "count_eq",
  "args": ["field_slug_1", "field_slug_2"]
}

Count Greater than

Takes two arguments and returns the number of aggregated entries where the first argument is greater than the second.

{
  "function": "count_gt",
  "args": ["field_slug_1", "field_slug_2"]
}

Count Greater than or equals

Takes two arguments and returns the number of aggregated entries where the first argument is greater than or equal to the second.

{
  "function": "count_gte",
  "args": ["field_slug_1", "field_slug_2"]
}

Count Lower than

Takes two arguments and returns the number of aggregated entries where the first argument is lower than the second.

{
  "function": "count_lt",
  "args": ["field_slug_1", "field_slug_2"]
}

Count Lower than or equals

Takes two arguments and returns the number of aggregated entries where the first argument is lower than or equal to the second.

{
  "function": "count_lte",
  "args": ["field_slug_1", "field_slug_2"]
}

Conditional functions

If

Takes three arguments as input. The first argument must be a boolean, and can either be a field or a function like eq that will return a boolean.
If the first argument is true, it returns the second argument. If the first argument is false, it returns the third argument.

{
  "function": "if",
  "args": [true, 1, 2]
}

Will always return 1. A more realistic example, if the number of crawls from Google equals the number of visits from Google, we return true, else we return null.

{
  "function": "if",
  "args": [
    {
      "function": "eq",
      "args": [
        "crawl.20210102.search_engines.google.crawls.count",
        "crawl.20210102.visits.organic.google.nb"
      ]
    },
    true,
    null
  ]
}

If not exists

Takes two arguments as input. Return the first argument if it is not null, else it returns the second one.

{
  "function": "if_not_exists",
  "args": ["search_console.period_0.count_clicks", 0]
}

When joining collections, we might not have any clicks for a crawled URL, because it never appeared on a SERP. It can therefore be null, but we don't want to handle null and just replace it with 0.

Equal

Takes two arguments as input and returns true if they are equal.

{
  "function": "eq",
  "args": ["field_slug_1", "field_slug_2"]
}

Not equal

Takes two arguments as input and returns true if they are different.

{
  "function": "ne",
  "args": ["field_slug_1", "field_slug_2"]
}

Lower than

Takes two arguments as input and returns true if the first one is lower than the second.

{
  "function": "lt",
  "args": ["field_slug_1", "field_slug_2"]
}

Greater than

Takes two arguments as input and returns true if the first one is greater than the second.

{
  "function": "gt",
  "args": ["field_slug_1", "field_slug_2"]
}

Lower than or equal

Takes two arguments as input and returns true if the first one is lower than or equal to the second.

{
  "function": "lte",
  "args": ["field_slug_1", "field_slug_2"]
}

Greater than or equal

Takes two arguments as input and returns true if the first one is greater than or equal to the second.

{
  "function": "gte",
  "args": ["field_slug_1", "field_slug_2"]
}

Exists

Takes one argument as input and returns true if the argument is not null.

{
  "function": "exists",
  "args": ["field_slug"]
}

Not exists

Takes one argument as input and returns true if the argument is null.

{
  "function": "not_exists",
  "args": ["field_slug"]
}

Not

Takes one argument as input and returns it's negated result.

{
  "function": "not",
  "args": ["field_slug"]
}

And

Takes any number of arguments as input and returns true if all arguments are true.

{
  "function": "and",
  "args": ["field_slug_1", "field_slug_2"]
}

Or

Takes any number of arguments as input and returns true if at least one argument is true.

{
  "function": "or",
  "args": ["field_slug_1", "field_slug_2"]
}

Literal

Takes one argument as input and handle the input as a constant.
Can be used to hard-code strings instead of interpreting them as field slugs.

{
  "function": "literal",
  "args": ["constant_value"]
}

Will return "constant_value" for each entry.

List functions

These functions can only be applied of list type fields.

First

Takes one argument and returns the first element of the list as result.

{
  "function": "first",
  "args": ["field_slug"]
}

Sort

Takes one argument and returns the list with sorted elements.

{
  "function": "sort",
  "args": ["field_slug"]
}

Array to String

Takes two arguments and returns a string composed of each element of the list, with is the first argument, with a separator which is the second argument.

{
  "function": "array_to_string",
  "args": ["field_slug", "separator"]
}

HTTP code functions

HTTP code family

Takes one argument as input and returns the family of the HTTP code.
Families are:

  • 2xx for HTTP codes between 200 and 299, and 304
  • exx for HTTP codes below 0
  • Yxx for all other HTTP codes, where Y is the first digit of the HTTP code
{
  "function": "http_code_family",
  "args": ["http_code"]
}

If the HTTP code is 429, this function returns 4xx.

HTTP code quality

Takes one argument as input and returns the quality of the HTTP code.
Quality values are:

  • good for HTTP codes between 200 and 299, and 304
  • bad for all others
{
  "function": "http_code_quality",
  "args": ["http_code"]
}

If the HTTP code is 429, this function returns bad.

Ranges function

The ranges function is explained in details in Dimensions.
It is called with two arguments. The first one if the field with continuous values that we want to aggregate on. The second argument is a list of objects containing one, or both, of the from and to keys.

{
  "function": "ranges",
  "args": [
    "search_console.period_0.avg_position",
    [
      {
        "to": 2
      },
      {
        "from": 2,
        "to": 5
      },
      {
        "from": 5,
        "to": 8
      },
      {
        "from": 8
      }
    ]
  ]
}

String functions

Concatenation

Takes any number of string arguments and concatenates them together.

{
  "function": "concat",
  "args": [
    "url",
    {
      "function": "literal",
      "args": ["&botify"]
    }
  ]
}

Will concatenate the URL with a &botify suffix. (the example could be improved to handle if URL path doesn't contain any query string to use a ? instead. Leaving that as an exercise to the reader).

Matches content

Takes two string arguments as input and returns true if the first argument is contained by the second argument one, when both arguments are normalized.
Normlization means ignoring any non-ascii characters and spaces in the strings.
Used to verify if a keyword is contained the title of the URL for instance:

{
  "function": "matches_content",
  "args": [
    "keyword_meta.normalized_keyword_words_string",
    "crawl.20210102.metadata.title.normalized_content"
  ]
}

Values list functions

Values list are mostly used for RealKeywords Keyword Groups feature, in order group together multiple keywords into one list.
Once you know your values list identifier, you will be able to query it through some functions.

Match lists

Takes as input one object with two keys: field and lists.

{
  "function": "match_lists",
  "args": [
    {
      "field": "keyword",
		  "lists": [
			  "list1-identifier1",
				"list2-identifier2",
				"list3-identifier3"
			]
    }
  ]
}

And it will return the value of the item it matches in one of the lists.
Lists are evaluated in the order they are passed to the function.

Special functions

By dimension

Allows breaking down a metric on another dimension than the ones specified in the BQL query.
Takes as input the metric and the dimension on which you want to break it down.
Used for sparklines in the Botify Application to break down the data by date, additionally to the selected dimensions:

{
  "function": "by_dimension",
  "args": [
    "search_console.period_0.count_clicks",
    "period_0.date"
  ]
}

To JSON string

Takes one argument as input and transforms it to a JSON string.
Useful to get rid of list fields and fit them into a single row.

{
  "function": "to_json_string",
  "args": ["query_string_keys"]
}