Backends and connectors

Connectors

The various backends that allow you to get the data exactly where you want it is one of the strengths of Botify's data exports. The backend/connector is defined through the connector key and the options and configuration are passed through the extra_config key:

{
  "connector": "<STRING>",
  "extra_config": {...}
}

Available backends and their configuration

Direct Download

"connector": "direct_download"

This backend is always available for export. Botify will store the export and provide a link to download the file.

It's a great backend for testing your exports and their accuracy before automating them.

One limitation is an upper bound on export size limit when using this backend. By default, the limit is at one million data rows, but can be raised depending on your plan. Don't hesitate to contact your CSM about this limitation.

There are no options available on the direct download backend. All formatters can be used with this backend.

S3

"connector": "<UUID>"

This backend will allow to push the data directly to your AWS S3 bucket.

Available options:

{
  "extra_config": {
    "filename": "<STRING>",
    "subdirectory": "<STRING>",
    "push_helpers": BOOLEAN
  }
}
  • filename: The file name. By default: data.EXT.gz, EXT depending on the chosen formatter. Can contain context variables.
  • filetype: The compressed file format. You only need to include this for .ZIP files; you do not need to include this for .GZ files, the default format.
  • subdirectory: The directory in which we want to store the file. By default empty. Can contain context variables.
  • push_helpers: Also create helper files on the backend when the formatter enables them. See Formatters.

All formatters can be used with this backend.

We don't expose any API yet to create a connector. To create this connector, please contact your CSM. Some specific permissions are needed on the bucket for us to export to your bucket.

Google Cloud Storage

"connector": "<UUID>"

This backend lets us push the data directly to your Google Cloud Storage bucket.

Available options:

{
  "extra_config": {
    "filename": "<STRING>",
    "filetype": "zip",
    "subdirectory": "<STRING>",
    "push_helpers": BOOLEAN
  }
}
  • filename: The file name, by default: data.EXT.gz, EXT depending on the chosen formatter. Can contain context variables.
  • filetype: The compressed file format. You only need to include this for .ZIP files; you do not need to include this for .GZ files, the default format.
  • subdirectory: The directory in which we want to store the file. By default empty. Can contain context variables.
  • push_helpers: Also create helper files on the backend when the formatter enables them. See Formatters.

All formatters can be used with this backend.

We don't expose any API yet to create a connector. To create this connector, please contact your CSM. Some specific permissions are needed on the bucket for us to export to your bucket.

List your available backends

GET https://api.botify.com/v1/connectors/USERNAME
Using the same authentication method as usual, this endpoint will list your available backends.
Example response:

{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": "12345678-ABCD-1234-ABCD-1234ABCD5678EF00",
            "type": "s3",
            "name": "s3://some.botify.bucket.export"
        },
        {
            "id": "direct_download",
            "type": "direct_download",
            "name": "Direct download"
        }
    ]
}

And you will be able to specify the id as connector in your export job.

Context variables

These are dynamic variables that can be used in filenames and directory paths, which adapt to the context of the export.

For example, exporting to a backend that supports custom filenames and subdirectories, one could use the following:

{
  "extra_config": {
    "subdirectory": "$crawl.20210102.year-$crawl.20210102.month_2digits",
    "filename": "botify-$crawl.20210102.day_2digits.csv.gz"
  }
}

to create the export 2021-01/botify-02.csv.gz.

The following example uses the project: botify-team/botify-blog that has a crawl on January 2, 2021.
The crawl_collection_slug corresponds to crawl.20210102. For more information, see Collections and periods.

Context variableDescriptionExample
userThe username associated to the project.botify-team
projectThe project slug.botify-blog
crawl_collection_slugThe analysis slug20210102
crawl_collection_slug.dateThe analysis date2021-01-02
crawl_collection_slug.dayThe analysis day2
crawl_collection_slug.day_2digitsThe analysis day on 2 digits02
crawl_collection_slug.monthThe analysis month.1
crawl_collection_slug.month_2digitsThe analysis month on 2 digits.01
crawl_collection_slug.yearThe analysis year.2021
crawl_collection_slug.week_numberThe analysis ISO week number.53
crawl_collection_slug.date_next_weekThe analysis date one week after.2021-01-09
crawl_collection_slug.day_next_weekThe analysis day one week after.9
crawl_collection_slug.day_2digits_next_weekThe analysis day one week after on 2 digits.09
crawl_collection_slug.month_next_weekThe analysis month one week after.1
crawl_collection_slug.month_2digits_next_weekThe analysis month one week after on 2 digits.01
crawl_collection_slug.year_next_weekThe analysis year one week after.2021
crawl_collection_slug.week_number_nextThe analysis ISO week number one week after.1

What’s Next

Discover how to use the backends, and the available formatters: