GATE Cloud on-line API

GATE Cloud provides an "on-line" API to process individual documents and return the annotated results in real time. As with all GATE Cloud APIs, you will need to generate an API key on the website and use it for HTTP basic authentication of all requests.

The online service accepts the data to be annotated as a standard HTTP POST request, and all configuration is done through HTTP protocol headers and URL query parameters. It can accept input in any of the formats supported by GATE Embedded, and returns the annotated result as either GATE XML, FastInfoset or JSON.

The endpoint URL for a particular pipeline can be found on the pipeline's description page linked from the GATE Cloud shop; to process a document you must make a POST request to this URL. (For example, the English Named Entity Recognizer has the endpoint https://cloud-api.gate.ac.uk/process-document/annie-named-entity-recognizer.) The POST body should be the data to be annotated, and the following headers are supported:

Request HTTP headers

URL query parameters

The following parameters can be passed to the service using the URL query string:

  https://..../endpoint?key=value&key=value

Response format

The GATE Cloud Online Processing service supports various different output formats configured by the Accept header in the request.

GATE JSON format

GATE JSON is a JSON format based on the format used by Twitter to represent entities in Tweet data. For each document, it consists of a JSON object with two properties, "text" containing the text of the document and "entities" containing the annotations as follows:

{
  "text":"The text of the document",
  "entities":{
    "SampleAnnotationType1":[
      {
        "indices":[0,3],
        "feature1":"value1",
        "feature2":"value2"
      }
    ],
    "SampleAnnotationType2":[
      {
        "indices":[12,15],
        "feature3":"value3"
      }
    ]
  }
}

The "entities" value is a map from annotation type to an array of annotation objects, belonging to this set, with the annotation's position within the text represented as "indices":[start,end] (zero-based character offsets, start inclusive, end exclusive). The annotation's features are represented as the other JSON properties of this object.

If the original document was Twitter JSON (i.e. it was sent with text/x-json-twitter MIME type), then the output JSON will attempt to preserve as far as possible the JSON structure of the original Tweet. If the original Tweet contained "entities" then the output annotations will be merged with those from the original JSON.

To request GATE JSON output, send an Accept header of application/gate+json (or application/json)

GATE Standoff XML

This is the XML based format used by GATE Developer and described in the GATE user guide.

To request GATE XML output, send an Accept header of application/gate+xml (or application/xml). The API can also return the same XML data using the binary FastInfoset serialization mechanism, which can be requested with an Accept header of application/fastinfoset.

By default, the GATE XML response formats encode the full GATE document, including the text, document features, and the selected annotations. The annotation offsets are indexes into the text as given in the response, which may be different from the data sent in the request (e.g. for HTML and XML requests the markup tags will be stripped out by GATE leaving just the plain text, for binary formats like PDF there is no real concept of "character offsets" in the original data). But in cases where you are sending plain text to be annotated, you can opt to omit the text and receive only the annotations in the response, by adding a parameter to the Accept header:

  Accept: application/gate+xml; includeText=no

  Accept: application/fastinfoset; includeText=no

In this case the returned XML will still have the <GateDocument> root element, but this will contain only the <AnnotationSet> elements from the GATE XML, without the <GateDocumentFeatures> or the <TextWithNodes>.

Error messages

When there is some problem preventing the correct execution of the request, an error response is returned with an HTTP status code specifying the type of the error as described below together with a human readable error message.

Status Code Description
40x Problems with the user input
50x Errors during the execution of the request