Collections

Overview

A collection is a source of data in Botify. Each collection exposes a set of fields that can be used as metrics, dimensions, and/or filters.

Current Collections

The following collections are currently in Botify, though we constantly add new collections as we bring more data into Botify! Refer to the section below to find the collections to which you have access in your project.

Collection NameDescription
conversionConversion data
conversion.dipGoogle Analytics conversion data (data integration platform)
crawl.YYYYMMDDCrawl data, where YYYYMMDD is the crawl slug
paid_search.ga4.dipGA4 paid search data (data integration platform)
QueryMaskML.YYYYMMDDCurrently contains only the field is_landing
search_consoleGoogle Search Console data
search_console_by_propertyGoogle Search Console data by website
searchenginesorphans._YYYYMMDD_Search engine orphan URLs, where YYYYMMDD is the crawl slug
semrush_domain_organicSemrush keyword metrics
sitemapsSitemap data
trended_crawlsTrended crawl data
visits.adobeAdobe Analytics visit data
visits.atinternetPiano Analytics visit data
visits.atinternet_airbytePiano Analytics visit data by Airbyte
visits.dipGA4 Analytics visit data (data integration platform)
visits.ganalyticsGoogle Analytics visit data
visits.ganalytics_premiumGoogle Analytics 360 visit data
web_vitals.field_dataCore Web Vitals field data
web_vitals.field_data_by_originCore Web Vitals field data origin summary

Note: There are many visit providers and collections, but each project can only access one visit collection, depending on the selected provider.

Your Available Collections

Each time a configuration is made on a project, a collection might become available. For instance, if you set a CrUX API Key in Data Station, the web_vitals.field_data will become available on your project.

There are two methods to find the collections that are available on your project:

Collections Explorer

The Collections Explorer spreadsheet template allows you to retrieve your project's collections, metrics, and dimensions in one location. You must have access to Google Sheets to use this spreadsheet.

  1. Access and make a copy of the Collections Explorer spreadsheet. Do not alter the spreadsheet, including any hidden cells or sheets.
  2. Navigate to your Botify account and copy your API token to the system clipboard.
  3. In the spreadsheet, paste your API token in cell B1. The projects that belong to your user account will populate the cell in the next row after a few seconds.

❗️

To protect your API token, don't share your cloned spreadsheet with anyone.

  1. Expand the dropdown list in cell B3 and select the desired project. The project collections that belong to your user account will populate the cell in the next row after a few seconds.
  2. Expand the dropdown list in cell B4 and select the desired collection.
The metrics and dimensions for the selected collection populate the rows below. To change to another project or collection, clear both cells B3 and B4 before making another selection.

📘

Validation in the Collections Explorer will prevent you from entering invalid information. If you encounter an error, clear the cell contents before trying again. If you encounter an error when cloning the spreadsheet or adding your API token, it should resolve after you select a project.

Query

Using the information in Getting started, construct the following query:

curl --location --request GET 'https://api.botify.com/v1/projects/<USERNAME>/<PROJECT_SLUG>/collections' \
--header 'Authorization: Token <API_TOKEN>'

which will return a list of collections:

[
    {
        "id": "global",
        "name": "URL Scheme and Segmentation",
        "date": "2021-03-05",
        "timestamped": false
    },
    {
        "id": "crawl.20210302",
        "name": "2021 Mar. 2nd",
        "date": "2021-03-02",
        "timestamped": false
    },
    {
        "id": "crawl.20210223",
        "name": "2021 Feb. 23rd",
        "date": "2021-02-23",
        "timestamped": false
    },
    {
        "id": "search_console",
        "name": "Search Console",
        "date": "2021-03-05",
        "timestamped": true,
        "date_start": "2018-03-17",
        "date_end": "2021-03-02"
    }
]

Timestamped collections

Notice the collections in the response above contain a timestamped key. A timestamped collection contains continuous data on a period, like the search console collection, updated daily to ingest the latest data from Google Search Console. In contrast, a non-timestamped collection represents a data snapshot at a certain moment, like the Crawl collection, which is a snapshot of a website at a certain time.

The Periods page provides more details on targeting specific date intervals.

📘

Timestamped collections require at least one period.

Examples

Refer to the following pages for examples of non-timestamped collections (crawl) and timestamped collections (search_console):

To select collections for a BQL query, specify a list of collections to prepare a query that joins SiteCrawler data with RealKeywords data:

{
  "collections": [
    "crawl.20210102",
    "search_console"
  ],
  ...
}