Backends and connectors
Connectors
One of the strengths of Botify's data exports are the various backends that will allow you get the data exactly where you want it.
The backend/connector is defined through the connector
key, and the options and configuration is passed through the extra_config
key:
{
"connector": "<STRING>",
"extra_config": {...}
}
Available backends and their configuration
Direct Download
"connector": "direct_download"
This backend is always available for an export.
The idea is that Botify will store the export for you, and provide you with a link to download the file.
It's a great backend for testing your exports and their accuracy before automating them.
One limitation is an upper bound on export size limit when using this backend. By default, the limit is at one million data rows, but can be raised depending on your plan. Don't hesitate to contact your CSM about this limitation.
No options available on the direct download backend.
All formatters can be used with this backend.
S3
"connector": "<UUID>"
This backend will allow to push the data directly to your AWS S3 bucket.
Available options:
{
"extra_config": {
"filename": "<STRING>",
"subdirectory": "<STRING>",
"push_helpers": BOOLEAN
}
}
filename
: the filename . By default:data.EXT.gz
,EXT
depending on the chosen formatter. Can contain context variables.subdirectory
: the directory in which we want to store the file. By default empty. Can contain context variables.push_helpers
: also create helper files on the backend when the formatter enables them. See Formatters.
All formatters can be used with this backend.
We don't expose any API yet to create a connector. To create this connector, please contact your CSM. Some specific permissions are needed on the bucket in order to be able for us to export to your bucket.
Google Cloud Storage
"connector": "<UUID>"
This backend will allow to push the data directly to your Google Cloud Storage bucket.
Available options:
{
"extra_config": {
"filename": "<STRING>",
"subdirectory": "<STRING>",
"push_helpers": BOOLEAN
}
}
filename
: the filename . By default:data.EXT.gz
,EXT
depending on the chosen formatter. Can contain context variables.subdirectory
: the directory in which we want to store the file. By default empty. Can contain context variables.push_helpers
: also create helper files on the backend when the formatter enables them. See Formatters.
All formatters can be used with this backend.
We don't expose any API yet to create a connector. To create this connector, please contact your CSM. Some specific permissions are needed on the bucket in order to be able for us to export to your bucket.
List your available backends
GET https://api.botify.com/v1/connectors/USERNAME
Using the same authentication method as usual, this endpoint will list your available backends.
Example response:
{
"count": 2,
"next": null,
"previous": null,
"results": [
{
"id": "12345678-ABCD-1234-ABCD-1234ABCD5678EF00",
"type": "s3",
"name": "s3://some.botify.bucket.export"
},
{
"id": "direct_download",
"type": "direct_download",
"name": "Direct download"
}
]
}
And you will be able to specify the id
as connector
in your export job.
Context variables
Those are dynamic variables that can be used in filenames and directory paths, and which adapt to the context of the export.
For example, exporting to a backend that supports custom filenames and subdirectories, one could do:
{
"extra_config": {
"subdirectory": "$crawl.20210102.year-$crawl.20210102.month_2digits",
"filename": "botify-$crawl.20210102.day_2digits.csv.gz"
}
}
would create the export 2021-01/botify-02.csv.gz
.
We will take as example the project: botify-team/botify-blog
that has a crawl on January 2 2021.
The crawl_collection_slug would correspond to crawl.20210102
. For more information, see Collections and periods.
Context variable | Description | Example |
---|---|---|
user | The username associated to the project. | botify-team |
project | The project slug. | botify-blog |
crawl_collection_slug | The analysis slug | 20210102 |
crawl_collection_slug.date | The analysis date | 2021-01-02 |
crawl_collection_slug.day | The analysis day | 2 |
crawl_collection_slug.day_2digits | The analysis day on 2 digits | 02 |
crawl_collection_slug.month | The analysis month. | 1 |
crawl_collection_slug.month_2digits | The analysis month on 2 digits. | 01 |
crawl_collection_slug.year | The analysis year. | 2021 |
crawl_collection_slug.week_number | The analysis ISO week number. | 53 |
crawl_collection_slug.date_next_week | The analysis date one week after. | 2021-01-09 |
crawl_collection_slug.day_next_week | The analysis day one week after. | 9 |
crawl_collection_slug.day_2digits_next_week | The analysis day one week after on 2 digits. | 09 |
crawl_collection_slug.month_next_week | The analysis month one week after. | 1 |
crawl_collection_slug.month_2digits_next_week | The analysis month one week after on 2 digits. | 01 |
crawl_collection_slug.year_next_week | The analysis year one week after. | 2021 |
crawl_collection_slug.week_number_next | The analysis ISO week number one week after. | 1 |
Updated almost 2 years ago
Discover how to use the backends, and the available formatters: