Apollo data federation for GraphQL schema-first

Data federation is the creation of a virtual database that aggregates data from distributed sources, giving them a common data model. It is an approach to data integration that provides a single source of data for front-end applications. It also gives backend developers flexibility in design and service isolation.

To get the most out of GraphQL, your organization should expose a single data graph that provides a unified interface for querying any combination of your backing data sources. However, it can be challenging to represent an enterprise-scale data graph with a single, monolithic GraphQL server.

To remedy this, you can divide your graph’s implementation across multiple composable services with Apollo Federation. Unlike other distributed GraphQL architectures (such as schema stitching), Apollo Federation uses a declarative programming model that enables each subgraph to implement only the part of your composed supergraph that it’s responsible for.

An Apollo Federation architecture consists of:

  • A collection of subgraphs (usually represented by different back-end services) that each define a distinct GraphQL schema

  • A gateway that composes the subgraphs into a federated data graph and executes queries across multiple subgraphs

                    +--------------------+
                    |  Federated schema  |
                    |  (Apollo Gateway)  |
                    +--------------------+
                              |
                 +------------+-----------+
                 |                        |
        +--------+---------+     +--------+---------+
        |  Library schema  |     |  Orders schema   |
        |    (Stargate)    |     |  (Node.js Apollo |
        |   Book/Reader    |     |   server)        |
        +------------------+     +------------------+

GraphQL schema

To achieve data federation, schemas need to be created and annotated to indicate how ownership is distributed. Let’s look at an example with three core entities:

Book: In a library, books are stored. The Book type uses a title and ISBN to uniquely identify each Book object, while also storing author information.

Reader: Readers read books and write reviews. Each Reader is uniquely identified by a name and user_id, while also storing birthdate, email addresses, street addresses, and reviews that a read has written. Each review consists of the book title, comment, rating, and review date.

Order: When a reader checks out books from the library, a record of the checkout_id, reader, and the books checked out are stored, uniquely identified by the checkout_id.

These three domains could be owned by three separate engineering teams responsible for their own data sources, business logic, and corresponding microservices. In an unfederated implementation, we would have to have this simple schema and the associated resolvers owned and implemented by a single team.

Stargate GraphQL schema

In this example, Book and Reader schema are supplied from a Stargate instance as the first data source, and the schema is described in the GraphQL schema-first section. Stargate also automatically adds the following:

include::partial$data_fed_directive.adoc

Javascript Apollo server schema

For Order, a simple Javascript, orders.js will be used to define the Order schema, supply some resolvers, insert some data, and start an Apollo server as the second data source. Let’s break down the script into four parts.

First, set up the script to require apollo server and apollo federation. Set the port for the server to 4001.

  • Orders setup

const { ApolloServer, gql } = require("apollo-server");
const { buildFederatedSchema } = require("@apollo/federation");

const port = 4001;

Next, we need to load the schema for this server. The schema consists of the object type Order, extensions of the object types Book and Reader gathered from the Stargate instance, queries and mutations required to insert data and make queries.

  • Orders schema

const typeDefs = gql`

  # need to declare types used by cql
  scalar Uuid
  scalar Date

  type Order @key(fields: "checkout_id reader books") {
    checkout_id: Int!
    reader: Reader
    books: [Book]
  }

  # Stub of the book entity from Stargate:
  # Add an extension for Order to find a checkout for a book
  extend type Book @key(fields: "title isbn") {
    title: String! @external
    isbn: String @external
    #author: [String]
    orders: [Order]
  }

  # Stub of the reader entity from Stargate:
  # Add an extension for Order to find a checkout for a reader
  extend type Reader @key(fields: "name user_id") {
    name: String! @external
    user_id: Uuid! @external
    orders: [Order]
  }

  extend type Query {
    order(checkout_id: Int!): Order
    orders: [Order]
  }
`;

In order to fetch field values from the objects that belong to another service, an extension of that object must be included in the Apollo server. Two directives are used in this example, @key and @external that pertain to data federation. The @key directive defines a combination of fields that uniquely identify and are used to fetch an object or interface. Only the primary key fields can be used in @key. The @external directive marks a field as owned by another service.

In Stargate schema, you can use just @key without specifying the fields, because the primary key is inferred.

In this example, for the Book object type, the fields title and isbn are defined in the @key directive as the fields that are required to uniquely identify a specific book (the primary key) and fetch the requested information from the object. The @external directive further marks the same fields are owned by the Stargate service, not the Apollo server. The Reader object type has a similar extension defined.

Now let’s define the resolvers that this script will use. A resolver is a function that’s responsible for populating the data for a single field in your schema. Stargate resolves objects based on the schema supplied, but the Javascript server requires resolver definition.

  • Orders resolvers

const resolvers = {

  Book: {
    orders(book) {
      return orders.filter(({ books }) => books.includes(book.title));
    }
  },
  Order: {
    book(order) {
      return order.book.map(title => ({ __typename: "Book", title }));
    }
  },
  Query: {
    order(_, args) {
      return orders.find(order => order.checkout_id == args.checkout_id);
    },
    orders() {
      return orders;
    }
  }
};

Finally, a server is started, building federated schema, listening on port 4001. Data is provided on the orders that are inserted into the service.

  • Orders server

server.listen({ port }).then(({ url }) => {
  console.log(`πŸš€ Orders service ready at ${url}`);
});

const orders = [
  { checkout_id: 1, reader: {name: "Herman Melville", user_id: "e0ec47e1-2b46-41ad-961c-70e6de629810"} , books: [ {title: "Moby Dick", isbn: "978-0140861723"}, {title: "Pride and Prejudice", isbn: "" } ] },
  { checkout_id: 2, reader: {name: "Jane Doe", user_id: "f02e2894-db48-4347-8360-34f28f958590"}, books: [ {title: "Native Son", isbn: "978-0061148507"}] }
];

The full script:

  • Orders.js

# tag::setup[]
const { ApolloServer, gql } = require("apollo-server");
const { buildFederatedSchema } = require("@apollo/federation");

const port = 4001;
# end::setup[]

# tag::schema[]
const typeDefs = gql`

  # need to declare types used by cql
  scalar Uuid
  scalar Date

  type Order @key(fields: "checkout_id reader books") {
    checkout_id: Int!
    reader: Reader
    books: [Book]
  }

  # Stub of the book entity from Stargate:
  # Add an extension for Order to find a checkout for a book
  extend type Book @key(fields: "title isbn") {
    title: String! @external
    isbn: String @external
    #author: [String]
    orders: [Order]
  }

  # Stub of the reader entity from Stargate:
  # Add an extension for Order to find a checkout for a reader
  extend type Reader @key(fields: "name user_id") {
    name: String! @external
    user_id: Uuid! @external
    orders: [Order]
  }

  extend type Query {
    order(checkout_id: Int!): Order
    orders: [Order]
  }
`;
# end::schema[]

# tag::resolvers[]
const resolvers = {

  Book: {
    orders(book) {
      return orders.filter(({ books }) => books.includes(book.title));
    }
  },
  Order: {
    book(order) {
      return order.book.map(title => ({ __typename: "Book", title }));
    }
  },
  Query: {
    order(_, args) {
      return orders.find(order => order.checkout_id == args.checkout_id);
    },
    orders() {
      return orders;
    }
  }
};
# end::resolvers[]
const server = new ApolloServer({
  schema: buildFederatedSchema([
    {
      typeDefs,
      resolvers
    }
  ])
});

# tag::server[]
server.listen({ port }).then(({ url }) => {
  console.log(`πŸš€ Orders service ready at ${url}`);
});

const orders = [
  { checkout_id: 1, reader: {name: "Herman Melville", user_id: "e0ec47e1-2b46-41ad-961c-70e6de629810"} , books: [ {title: "Moby Dick", isbn: "978-0140861723"}, {title: "Pride and Prejudice", isbn: "" } ] },
  { checkout_id: 2, reader: {name: "Jane Doe", user_id: "f02e2894-db48-4347-8360-34f28f958590"}, books: [ {title: "Native Son", isbn: "978-0061148507"}] }
];
# end::server[]

Installing Apollo gateway

Let’s look at a simple Javascript that runs the Apollo gateway: Let’s break this script down as well, into three parts.

  • Gateway1

  • Gateway2

  • Gateway3

const { ApolloServer } = require("apollo-server");
const { ApolloGateway, RemoteGraphQLDataSource } = require("@apollo/gateway");

// The Stargate token that Apollo Gateway will use when it fetches the schema
// definitions.
// Note that this will only be used for internal queries; for user queries, the
// client must provide their own 'x-cassandra-token' HTTP header, and the
// gateway will forward it to Stargate.
const stargateIntrospectionToken = 'd5e7e1fb-399b-4f0b-964b-296ee97d59d3';

class StargateGraphQLDataSource extends RemoteGraphQLDataSource {
  willSendRequest({ request, context }) {
    const token = context.stargateToken
    if (token != null) {
      request.http.headers.set('x-cassandra-token', token);
    }
  }
}
const gateway = new ApolloGateway({

  serviceList: [

    // Stargate:
    { name: "library", url: "http://127.0.0.1:8080/graphql/library"},

    // External service (mock):
    { name: "orders", url: "http://localhost:4001/graphql" }
  ],

  introspectionHeaders: {
    'x-cassandra-token': stargateIntrospectionToken
  },

  buildService({name, url}) {
    if (name == "library") {
      return new StargateGraphQLDataSource({url});
    } else {
      return new RemoteGraphQLDataSource({url});
    }
  },

  // Experimental: Enabling this enables the query plan view in Playground.
  __exposeQueryPlanExperimental: true,
});
(async () => {
  const server = new ApolloServer({
    gateway,

    // Apollo Graph Manager (previously known as Apollo Engine)
    // When enabled and an `ENGINE_API_KEY` is set in the environment,
    // provides metrics, schema management and trace reporting.
    engine: false,

    // Subscriptions are unsupported but planned for a future Gateway version.
    subscriptions: false,

    context: ({ req, res }) => {
      return {
        stargateToken: req.headers['x-cassandra-token']
      };
    }
  });

  server.listen().then(({ url }) => {
    console.log(`πŸš€ Gateway ready at ${url}`);
  });
})();

In Gateway1, an Apollo server and Apollo gateway are defined as required. In order for the gateway to access the Stargate data source, a token is required. That token is specified as stargateIntrospectionToken.

In Gateway2, a serviceList that defines the data sources lists the Stargate instance and the Javascript Apollo server that are described above. It builds the services based on logic tied to the name of the data source, library. Note that if more data sources are specified, you’ll need to change the logic that specifies which service is used for queries and mutations.

An experimental flag is set to allow GraphQL Playground to display the query plan for the federated service.

In Gateway3, a new Apollo server, designated as a gateway is started. There are some settings currently set to false; the code gives an explanation. If the fetched data is from the Stargate instance, the token must be passed in the header of the request as x-cassandra-token.

The full script:

  • gateway.js

# tag::gateway1[]
const { ApolloServer } = require("apollo-server");
const { ApolloGateway, RemoteGraphQLDataSource } = require("@apollo/gateway");

// The Stargate token that Apollo Gateway will use when it fetches the schema
// definitions.
// Note that this will only be used for internal queries; for user queries, the
// client must provide their own 'x-cassandra-token' HTTP header, and the
// gateway will forward it to Stargate.
const stargateIntrospectionToken = 'd5e7e1fb-399b-4f0b-964b-296ee97d59d3';

class StargateGraphQLDataSource extends RemoteGraphQLDataSource {
  willSendRequest({ request, context }) {
    const token = context.stargateToken
    if (token != null) {
      request.http.headers.set('x-cassandra-token', token);
    }
  }
}
# end::gateway1[]

# tag::gateway2[]
const gateway = new ApolloGateway({

  serviceList: [

    // Stargate:
    { name: "library", url: "http://127.0.0.1:8080/graphql/library"},

    // External service (mock):
    { name: "orders", url: "http://localhost:4001/graphql" }
  ],

  introspectionHeaders: {
    'x-cassandra-token': stargateIntrospectionToken
  },

  buildService({name, url}) {
    if (name == "library") {
      return new StargateGraphQLDataSource({url});
    } else {
      return new RemoteGraphQLDataSource({url});
    }
  },

  // Experimental: Enabling this enables the query plan view in Playground.
  __exposeQueryPlanExperimental: true,
});
# end::gateway2[]

# tag::gateway3[]
(async () => {
  const server = new ApolloServer({
    gateway,

    // Apollo Graph Manager (previously known as Apollo Engine)
    // When enabled and an `ENGINE_API_KEY` is set in the environment,
    // provides metrics, schema management and trace reporting.
    engine: false,

    // Subscriptions are unsupported but planned for a future Gateway version.
    subscriptions: false,

    context: ({ req, res }) => {
      return {
        stargateToken: req.headers['x-cassandra-token']
      };
    }
  });

  server.listen().then(({ url }) => {
    console.log(`πŸš€ Gateway ready at ${url}`);
  });
})();
# end::gateway3[]

Federated queries

Simple example

Now that we have two subgraphs supplied from two different data sources, and a gateway running, we can explore how data from all data sources can be returned in a query.

For example, if I want to discover all the books that were checked out at the same time, and get all the book data, I can use this query in GraphQL Playground, pointed to the URL where the gateway is running, or localhost:4000 in this case:

  • Get a checkout order by ID

  • Result

query getOrder {
  order(checkout_id: 1) {
    checkout_id
    reader {
      name
      user_id
      email
      address {
        city
      }
    }
    books {
      title
      isbn
      author
    }
  }
}
{
  "data": {
    "order": {
      "checkout_id": 1,
      "reader": {
        "name": "Herman Melville",
        "user_id": "e0ec47e1-2b46-41ad-961c-70e6de629810",
        "email": [
          "herman.melville@gmail.com",
          "hermy@mobydick.org"
        ],
        "address": [
          {
            "city": "Boston"
          }
        ]
      },
      "books": [
        {
          "title": "Moby Dick",
          "isbn": "978-0140861723",
          "author": null
        },
        {
          "title": "Pride and Prejudice",
          "isbn": "",
          "author": null
        }
      ]
    }
  }
}

While the resulting return seems unremarkable, think about the result for a moment. The Apollo server returning an order did supply the reader name and user_id, but the email and address information is fetched from the Stargate instance. Likewise, the Stargate instance is also supplying the author information for the books returned in the order query. Thus, the federated supergraph of Order, Book, and Reader, along with the resolvers in the Apollo server, are fetching data from two different servers! That is a very valuable feature, especially for application developers who need to fetch and use data from data sources that they do not control.