The Moz Hyperlinks API: An Introduction

What precisely IS an API? They’re these issues that you just copy and paste lengthy unusual codes into Screaming Frog for hyperlinks knowledge on a Web site Crawl, proper?

I’m right here to inform you there’s a lot extra to them than that – in the event you’re prepared to take just some little steps. However first, some fundamentals.

What’s an API?

API stands for “utility programming interface”, and it’s simply the best way of… utilizing a factor. All the pieces has an API. The net is a big API that takes URLs as enter and returns pages.

However particular knowledge providers just like the Moz Hyperlinks API have their very own algorithm. These guidelines fluctuate from service to service and could be a main stumbling block for individuals taking the following step.

When Screaming Frog provides you the additional hyperlinks columns in a crawl, it’s utilizing the Moz Hyperlinks API, however you’ll be able to have this functionality anyplace. For instance, all that tedious guide stuff you do in spreadsheet environments could be automated from data-pull to formatting and emailing a report.

In case you take this subsequent step, you could be extra environment friendly than your opponents, designing and delivering your individual search engine optimisation providers as a substitute of relying upon, paying for, and being restricted by the following proprietary product integration.

GET vs. POST

Most APIs you’ll encounter use the identical knowledge transport mechanism as the net. Meaning there’s a URL concerned identical to an internet site. Don’t get scared! It’s simpler than you assume. In some ways, utilizing an API is rather like utilizing an internet site.

As with loading net pages, the request could also be in one in every of two locations: the URL itself, or within the physique of the request. The URL known as the “endpoint” and the usually invisibly submitted additional a part of the request known as the “payload” or “knowledge”. When the info is within the URL, it’s referred to as a “question string” and signifies the “GET” methodology is used. You see this on a regular basis if you search:

https://www.google.com/search?q=moz+hyperlinks+api <-- GET methodology 

When the info of the request is hidden, it’s referred to as a “POST” request. You see this if you submit a kind on the net and the submitted knowledge doesn’t present on the URL. Once you hit the again button after such a POST, browsers often warn you in opposition to double-submits. The explanation the POST methodology is usually used is you can match much more within the request utilizing the POST methodology than the GET methodology. URLs would get very lengthy in any other case. The Moz Hyperlinks API makes use of the POST methodology.

Making requests

An online browser is what historically makes requests of internet sites for net pages. The browser is a kind of software program often called a consumer. Purchasers are what make requests of providers. Extra than simply browsers could make requests. The power to make consumer net requests is usually constructed into programming languages like Python, or could be damaged out as a standalone device. The preferred instruments for making requests exterior a browser are curl and wget.

We’re discussing Python right here. Python has a built-in library referred to as URLLIB, however it’s designed to deal with so many several types of requests that it’s a little bit of a ache to make use of. There are different libraries which are extra specialised for making requests of APIs. The preferred for Python known as requests. It’s so standard that it’s used for nearly each Python API tutorial you’ll discover on the net. So I’ll use it too. That is what “hitting” the Moz Hyperlinks API seems to be like:

response = requests.put up(endpoint, knowledge=json_string, auth=auth_tuple)

Provided that all the things was arrange accurately (extra on that quickly), this may produce the next output:

'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': ['anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026]

That is JSON knowledge. It is contained throughout the response object that was returned from the API. It’s not on the drive or in a file. It’s in reminiscence. As long as it’s in reminiscence, you are able to do stuff with it (typically simply saving it to a file).

In case you wished to seize a chunk of information inside such a response, you could possibly seek advice from it like this:

response['results'][0]['external_pages']

This says: “Give me the primary merchandise within the outcomes listing, after which give me the external_pages worth from that merchandise.” The end result could be 7162.

NOTE: In case you’re truly following alongside executing code, the above line gained’t work alone. There’s a certain quantity of setup we’ll do shortly, together with putting in the requests library and organising a number of variables. However that is the essential concept.

JSON

JSON stands for JavaScript Object Notation. It’s a means of representing knowledge in a means that’s simple for people to learn and write. It’s additionally simple for computer systems to learn and write. It’s a quite common knowledge format for APIs that has considerably taken over the world for the reason that older methods had been too tough for most individuals to make use of. Some individuals may name this a part of the “restful” API motion, however the way more tough XML format can also be thought of “restful” and everybody appears to have their very own interpretation. Consequently, I discover it greatest to only give attention to JSON and the way it will get out and in of Python.

Python dictionaries

I lied to you. I mentioned that the info construction you had been above was JSON. Technically it’s actually a Python dictionary or dict datatype object. It’s a particular sort of object in Python that’s designed to carry key/worth pairs. The keys are strings and the values could be any sort of object. The keys are just like the column names in a spreadsheet. The values are just like the cells within the spreadsheet. On this means, you’ll be able to consider a Python dict as a JSON object. For instance right here’s making a dict in Python:

my_dict = 
    "identify": "Mike",
    "age": 52,
    "metropolis": "New York"

And right here is the equal in JavaScript:

var my_json = 
    "identify": "Mike",
    "age": 52,
    "metropolis": "New York"

Just about the identical factor, proper? Look carefully. Key-names and string values get double-quotes. Numbers don’t. These guidelines apply constantly between JSON and Python dicts. In order you may think, it’s simple for JSON knowledge to circulate out and in of Python. It is a nice present that has made trendy API-work extremely accessible to the newbie by way of a device that has revolutionized the sphere of information science and is making inroads into advertising, Jupyter Notebooks.

Flattening knowledge

However beware! As knowledge flows between programs, it’s not unusual for the info to subtly change. For instance, the JSON knowledge above is perhaps transformed to a string. Strings may look precisely like JSON, however they’re not. They’re only a bunch of characters. Generally you’ll hear it referred to as “serializing”, or “flattening”. It’s a delicate level, however price understanding as it’s going to assist with one of many largest obstacles with the Moz Hyperlinks (and most JSON) APIs.

Objects have APIs

Precise JSON or dict objects have their very own little APIs for accessing the info inside them. The power to make use of these JSON and dict APIs goes away when the info is flattened right into a string, however it’s going to journey between programs extra simply, and when it arrives on the different finish, it will likely be “deserialized” and the API will come again on the opposite system.

Information flowing between programs

That is the idea of moveable, interoperable knowledge. Again when it was referred to as Digital Information Interchange (or EDI), it was a really large deal. Then alongside got here the net after which XML after which JSON and now it’s only a regular a part of doing enterprise.

In case you’re in Python and also you need to convert a dict to a flattened JSON string, you do the next:

import json

my_dict = 
    "identify": "Mike",
    "age": 52,
    "metropolis": "New York"


json_string = json.dumps(my_dict)

…which might produce the next output:

'"identify": "Mike", "age": 52, "metropolis": "New York"'

This seems to be nearly the identical as the unique dict, however in the event you look carefully you’ll be able to see that single-quotes are used across the whole factor. One other apparent distinction is you can line-wrap actual structured knowledge for readability with none unwell impact. You may’t do it so simply with strings. That’s why it’s offered all on one line within the above snippet.

Such stringifying processes are finished when passing knowledge between completely different programs as a result of they don’t seem to be at all times suitable. Regular textual content strings then again are suitable with nearly all the things and could be handed on web-requests with ease. Such flattened strings of JSON knowledge are often known as the request.

Anatomy of a request

Once more, right here’s the instance request we made above:

response = requests.put up(endpoint, knowledge=json_string, auth=auth_tuple)

Now that you just perceive what the variable identify json_string is telling you about its contents, you shouldn’t be stunned to see that is how we populate that variable:

 data_dict = 
    "goal": "moz.com/weblog",
    "scope": "web page",
    "restrict": 1


json_string = json.dumps(data_dict)

…and the contents of json_string seems to be like this:

'"goal": "moz.com/weblog", "scope": "web page", "restrict": 1'

That is one in every of my key discoveries in studying the Moz Hyperlinks API. That is in widespread with numerous different APIs on the market however journeys me up each time as a result of it’s a lot extra handy to work with structured dicts than flattened strings. Nonetheless, most APIs anticipate the info to be a string for portability between programs, so we’ve to transform it on the final second earlier than the precise API-call happens.

Pythonic hundreds and dumps

Now you could be questioning in that above instance, what a dump is doing in the course of the code. The json.dumps() perform known as a “dumper” as a result of it takes a Python object and dumps it right into a string. The json.hundreds() perform known as a “loader” as a result of it takes a string and hundreds it right into a Python object.

The explanation for what seem like singular and plural choices are literally binary and string choices. In case your knowledge is binary, you employ json.load() and json.dump(). In case your knowledge is a string, you employ json.hundreds() and json.dumps(). The s stands for string. Leaving the s off means binary.

Don’t let anyone inform you Python is ideal. It’s simply that its tough edges aren’t excessively objectionable.

Project vs. equality

For these of you utterly new to Python or programming typically, what we’re doing after we hit the API known as an project. The results of requests.put up() is being assigned to the variable named response.

response = requests.put up(endpoint, knowledge=json_string, auth=auth_tuple)

We’re utilizing the = signal to assign the worth of the best aspect of the equation to the variable on the left aspect of the equation. The variable response is now a reference to the article that was returned from the API. Project is completely different from equality. The == signal is used for equality.

# That is project:
a = 1  # a is now equal to 1

# That is equality:
a == 1  # True, however depends that the above line has been executed

The POST methodology

response = requests.put up(endpoint, knowledge=json_string, auth=auth_tuple)

The requests library has a perform referred to as put up() that takes 3 arguments. The primary argument is the URL of the endpoint. The second argument is the info to ship to the endpoint. The third argument is the authentication data to ship to the endpoint.

Key phrase parameters and their arguments

You might discover that among the arguments to the put up() perform have names. Names are set equal to values utilizing the = signal. Right here’s how Python features get outlined. The primary argument is positional each as a result of it comes first and in addition as a result of there’s no key phrase. Keyworded arguments come after position-dependent arguments. Belief me, all of it is sensible after some time. All of us begin to assume like Guido van Rossum.

def arbitrary_function(argument1, identify=argument2):
    # do stuff

The identify within the above instance known as a “key phrase” and the values that are available on these areas are referred to as “arguments”. Now arguments are assigned to variable names proper within the perform definition, so you’ll be able to seek advice from both argument1 or argument2 anyplace inside this perform. In case you’d wish to study extra in regards to the guidelines of Python features, you’ll be able to examine them here.

Establishing the request

Okay, so let’s allow you to do all the things mandatory for that success assured second. We’ve been exhibiting the essential request:

response = requests.put up(endpoint, knowledge=json_string, auth=auth_tuple)

…however we haven’t proven all the things that goes into it. Let’s try this now. In case you’re following alongside and don’t have the requests library put in, you are able to do so with the next command from the identical terminal atmosphere from which you run Python:

pip set up requests

Usually instances Jupyter could have the requests library put in already, however in case it doesn’t, you’ll be able to set up it with the next command from inside a Pocket book cell:

!pip set up requests

And now we are able to put all of it collectively. There’s just a few issues right here which are new. A very powerful is how we’re taking 2 completely different variables and mixing them right into a single variable referred to as AUTH_TUPLE. You’ll have to get your individual ACCESSID and SECRETKEY from the Moz.com web site.

The API expects these two values to be handed as a Python knowledge construction referred to as a tuple. A tuple is a listing of values that don’t change. I discover it attention-grabbing that requests.put up() expects flattened strings for the knowledge parameter, however expects a tuple for the auth parameter. I suppose it is sensible, however these are the delicate issues to know when working with APIs.

Right here’s the complete code:

import json
import pprint
import requests

# Set Constants
ACCESSID = "mozscape-1234567890"  # Change along with your entry ID
SECRETKEY = "1234567890abcdef1234567890abcdef"  # Change along with your secret key
AUTH_TUPLE = (ACCESSID, SECRETKEY)

# Set Variables
endpoint = "https://lsapi.seomoz.com/v2/anchor_text"
data_dict = "goal": "moz.com/weblog", "scope": "web page", "restrict": 1
json_string = json.dumps(data_dict)

# Make the Request
response = requests.put up(endpoint, knowledge=json_string, auth=AUTH_TUPLE)

# Print the Response
pprint(response.json())

…which outputs:

'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': ['anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026]

Utilizing all higher case for the AUTH_TUPLE variable is a conference many use in Python to point that the variable is a continuing. It’s not a requirement, however it’s a good suggestion to comply with conventions when you’ll be able to.

You might discover that I didn’t use all uppercase for the endpoint variable. That’s as a result of the anchor_text endpoint will not be a continuing. There are a variety of various endpoints that may take its place relying on what kind of lookup we wished to do. The alternatives are:

  1. anchor_text

  2. final_redirect

  3. global_top_pages

  4. global_top_root_domains

  5. index_metadata

  6. link_intersect

  7. link_status

  8. linking_root_domains

  9. hyperlinks

  10. top_pages

  11. url_metrics

  12. usage_data

And that leads into the Jupyter Pocket book that I ready on this subject positioned here on Github. With this Pocket book you’ll be able to prolong the instance I gave right here to any of the 12 obtainable endpoints to create a wide range of helpful deliverables, which would be the topic of articles to comply with.