Export my RealKeywords data

In this section, we will see how to export one million raw data rows from your RealKeywords integration from January 2021.

1. Get your configuration

We will need 3 pieces of information to run the export. All can be gathered by following the guide in Getting started.

  • the username and project_slug, which identifies the project we are targeting
  • your API Token, which is used to identify you

In the rest of the tutorial, we will consider these values:

  • username: botify-team
  • project_slug: botify-blog
  • API Token: 123abc

The period of dates we will use in the snippets below is from January 1st 2021 to January 31st 2021. If you want to export data for another period of time, you can change the periods key and select a period for which data is available on your project.

2. The BQL query

This section is the BQL Query that we will run in order to fetch crawl data.
This query will fetch for each combination of URL, keyword, device, country for each day of data:

  • the number of clicks
  • the number of impressions
  • the average position
  • the Clickthrough rate (CTR)
  • the number of missed clicks
  • whether this combination of dimensions is new on this day or not
{
  "collections": [
    "search_console"
  ],
  "periods": [
    ["2021-01-01", "2021-01-31"]
  ],
  "query": {
    "dimensions": [
      "url",
      "keyword",
      "device",
      "country",
      "search_console.period_0.date"
    ],
    "metrics": [
      "search_console.period_0.count_clicks",
      "search_console.period_0.count_impressions",
      "search_console.period_0.avg_position",
      "search_console.period_0.ctr",
      "search_console.period_0.count_missed_clicks",
      "search_console.period_0.is_new"
    ],
    "sort": [
      {
        "index": 0,
        "type": "metrics",
        "order": "desc"
      }
    ]
  }
}

This query should give you a good overview of your keywords data and first million combinations.
Feel free to remove dimensions in order to aggregate your data by only a subset of those.

3. Execute the API call

To launch the export, you will need to run the HTTP request to our servers.
You should be able to import the cURL command below into an HTTP tool if you use one.

🚧

Use your own configuration

Don't forget to replace

  • --header 'Authorization: Token 123abc' by your own API token value. Replace 123abc
  • "username": "botify-team", by the project's username. Replace botify-team
  • "project": "botify-blog", by your project slug. Replace botify-blog
curl --location --request POST 'https://api.botify.com/v1/jobs' \
--header 'Authorization: Token 123abc' \
--header 'Content-Type: application/json' \
--data-raw '{
  "job_type": "export",
  "payload": {
    "username": "botify-team",
    "project": "botify-blog",
    "connector": "direct_download",
    "formatter": "csv",
    "formatter_config": {
        "print_header": true
    },
    "export_size": 5000,
    "query": {
      "collections": ["search_console"],
      "periods": [
          ["2021-01-01", "2021-01-31"]
      ],
      "query": {
        "dimensions": [
          "url",
          "keyword",
          "device",
          "country",
          "search_console.period_0.date"
        ],
        "metrics": [
            "search_console.period_0.count_clicks",
            "search_console.period_0.count_impressions",
            "search_console.period_0.avg_position",
            "search_console.period_0.ctr",
            "search_console.period_0.missed_clicks"
        ],
        "sort": [
            {
                "index": 0,
                "type": "metrics",
                "order": "desc"
            }
        ]
      }
    }
  }
}'

If the export was launched correctly, you should get a response like

{
    "job_id": 99999,
    "job_type": "export",
    "job_url": "/v1/jobs/99999",
    "job_status": "CREATED",
    "payload": {...},
    "results": null,
    "date_created": "2021-03-15T16:45:48.110189Z",
    "user": "botify-team",
    "metadata": null
}

with the explicit payload.
If the job_status is CREATED, the job was created successfully :tada:

The information you will need here is the job_id: 99999.
We will use it to fetch the jobs status.

4. Fetch the job status

Now that the job is in the pipeline, we will fetch it's status until it is done.
For more details, see Export job reference.

We will send a GET request using the job_id from the previous response.

curl --location --request GET 'https://api.botify.com/v1/jobs/99999' \
--header 'Authorization: Token 123abc'

Which will return something like:

{
    "job_id": 99999,
    "job_type": "export",
    "job_url": "/v1/jobs/99999",
    "job_status": "DONE",
    "results": {
        "nb_lines": 956,
        "download_url": "https://d121xa69ioyktv.cloudfront.net/collection_exports/a/b/c/abcdefghik987654321/botify-2021-03-15.csv.gz"
    },
    "date_created": "2021-03-15T16:45:48.110189Z",
    "payload": {...},
    "user": "botify-team",
    "metadata": null
}

If the job_status is PROCESSING, wait a bit and run the same request until the status switches to DONE.

5. Fetch the results

Once the job is done, the results object will have a download_url field. The URL links directly to your exported SEO data. Download it by accessing the given link.

6. Extract the result

Once the file downloaded, one might notice that the file ends with .csv.gz. The data is compressed. Software on your Operating System should be able to extract the CSV file.

For more options about the data export options, see Export your data and it's subsections. Existing options are connecting this kind of export directly to your storage system through a connectors.