Apollo Client caching and cursor-based pagination

Featured on Hashnode

Story time

Recently, in a project I'm working on, we started reworking some parts of the app and we decided to use Apollo for communicating with our GraphQL API. Initially all went pretty smoothly, however when we started introducing more and more features to match the initial state, we stumbled upon several cases that we were not sure if we're solving them properly.

The part of the UI we wanted to redo is based on a cursor-based, paginated listing of some resource, which is fetching another batch of the list when a load more button is clicked. To achieve that, we're getting part of the listing and cursor that enables us to fetch the next list chunk when passed to the subsequent request. Moreover, users can perform some operations on the listed resources, like editing or deleting them as well as displaying separate details view, which enables users to perform some more actions for selected items.

We wanted to dive deeper and understand how we could use Apollo and its cache to perform the operations smarter and keep the data in sync without additional data refetches. During the journey, I've found a handful of interesting things that I decided to sum up in the article and point to the source I learned them from. Along the way, I've prepared a simplified example in the form of a Todo List, which enabled me to verify all the use-cases (I hope so) we'll need to tackle in the app. Here's the link to the finished example on GitHub.

Cache normalization

Apollo Cache is in fact slightly smarter and does some work under the hood, besides simply caching responses for called operations based on given arguments to return them later in other places. Having the knowledge of the data model structure and connections between the nodes of the graph, Apollo can structure the data in a more convenient way and enable us to leverage Apollo Client's Cache API to perform some more complicated operations against the cached data blob.

If we're fetching some data with nested entities or a list of them, Apollo can identify nested nodes, extract them as separate entities in cache and store the reference to them instead of duplicating nested data. Such a process is called data normalization. It allows to deduplicate and store the data independently, while preserving its dependencies and it's commonly used in databases. In case of Apollo, the extracted objects are stored in a common, flattened list and each of them is identified by combination of __typename describing the type of entity and keyFields uniquely identifying the item (id is the default keyField, but it can be configured to be a list of fields based on our needs). Thanks to the possibility of uniquely identifying items, we can fetch and cache locally some part of the data graph as well as conveniently keep track of changes for particular entities inside numerous queries.

For more details on the topic, I can highly recommend Demystifying Cache Normalization written by Khalil Stemmler.

Moreover, to get to know better how the data is stored internally, I highly recommend familiarizing with Apollo Client Devtools. Here's the example on how it looks for the proof of concept described later. We can view the single entities as well as the nested references to them in queries

apollo-devtools.png

Updating the cache or refetching the data?

Most of the time, when we have a list of 20 items - fetching the whole list once again when we know we changed one field in one item... may be an overkill... to put it mildly. Hence by updating the cache locally we can nicely reduce the number of network calls, which in turn may reduce the number of spinners we're throwing in the faces of our users, leading to better UX.

However, sometimes we may need to fetch additional fields, for example when displaying details for a given item or we just want to make sure that we're serving the freshest data. These are some of the cases in which fetching missing data is needed or refetching is totally valid.

One another factor, which is worth it to take under consideration are fetching strategies, which define how to access the data with regards to reading from cache/making a request or mix of both of them.

It's worth it to know that cache-first is the default behavior, which tries to retrieve the data from cache first and in case of cache miss it makes a request. Some other solution would be to choose cache-than-network policy to show the data from cache initially, even if it's stale, while making a request under the hood and replace the data provided to components with the fresh one once it's provided.

There are various business cases, for which different solutions may be applicable. What's more interesting for developers, most of the heavy lifting is performed on the Apollo Client side, so I'd recommend to consider various approaches if these could bring value to our users. To browser other fetch policies please take a look at docs.

Pagination and merging results

Remember that I wrote initially about some paginated list of resources? If we'd keep querying the Todos, passing different parameters related with pagination to the query (in my case limit and cursor), each subsequent request for a new chunk of the list would be treated as a separate entry in the cache.

For example - for the two queries listed below

query TodosBatch {
  todosBatch(input: { limit: 5 }) {
    todos {
      id
      name
      completed
    }
    cursor
  }
}

query TodosBatch {
  todosBatch(
    input: { limit: 5, cursor: "89e99cba-4d8b-4c3d-8b69-ee4c792f40be" }
  ) {
    todos {
      id
      name
      completed
    }
    cursor
  }
}

We'll have two entries in our cache

g2.png

However, Apollo exposes a dedicated API for pagination that along with fields policies configuration can be used to customize the way how the data from subsequent calls is merged into a single cache entry, which is returned by React hook.

What's more, Apollo gives us several predefined utilities that can be used as a field policy - for example offsetLimitPagination for offset-based pagination - which makes it much more convenient. The whole API, suggested approaches, their pros and cons and available helpers exposed by Apollo are described in docs accordingly.

In my case, for the cursor-based pagination, with the custom cursor generated by the backend, there is no pre-packed policy, but Apollo docs for cursor-based pagination provides enough information and examples to build it on your own.

To build our own field policy for given query, we can define several properties:

  • keyArgs - field that allows us to customize cache keys, hence cache results separately based on the query arguments. It could be useful if we'd work with some filters or categories to store the results independently for queries with different parameters.
  • merge - custom merge function to define how the data from subsequent requests could be merged. What's more interesting, we can store data internally in a different form. In my case, I'm having some infinity-scroll like view and in order to avoid duplicates I was able to store the data internally as a map, similarly as described in docs to make sure there are no duplicates if I store some item in cache after creation and it would be later returned by the backend.
  • read - define how the data should be read by the client, for example if we'd like to have a custom pagination on the frontend side, we could rely on the data stored in cache and change the page size accordingly to user preferences. It can be also used to transform the map used in the custom merge merge function into a list that will be consumed by our components.

In my case the custom field policy configured in the cache looks as follows.

const cache = new InMemoryCache({
  typePolicies: {
    Query: {
      fields: {
        todosBatch: {
          keyArgs: false,
          merge(
            existing:
              | {
                  cursor: string | null;
                  todos: Record<string, Reference>;
                }
              | undefined,
            incoming: {
              cursor: string | null;
              todos: Reference[];
            },
            { readField }
          ) {
            const todos = { ...existing?.todos };
            // merge the incoming Todos into a map with previously fetched todos
            // using the id field as a key for the map
            incoming.todos.forEach((todo) => {
              const id = readField<string>("id", todo);
              if (id) {
                todos[id] = todo;
              }
            });
            return {
              cursor: incoming.cursor,
              todos,
            };
          },
          // transform the map of todos into an array
          read(existing: {
            cursor: string | null;
            todos: Record<string, Reference>;
          }) {
            if (existing) {
              return {
                cursor: existing.cursor,
                todos: Object.values(existing.todos),
              };
            }
          },
        },
      },
    },
  },
});

Updating the local cache after mutations

Having in mind the discussion about updating cache locally, let's check out how we can implement it for our paginated and customly merged list of Todos

Updating single Todo

Here's the code for updating the single Todo in local cache after mutation, extracted into a single hook.

import { useUpdateTodoMutation } from "~/graphql/generatedTypes";

export function useUpdateTodo() {
  const [updateTodo] = useUpdateTodoMutation();

  return { updateTodo };
}

Yes, indeed, we don't need to do anything to update the local cache, but there is a but - the mutation performing the update needs to return the object containing proper __typename, object identifier and updated fields, which allows Apollo Client to seamlessly find the corresponding node in the normalized, local cache and merge the incoming result with the stored data. The same thing will work if the operation returns multiple items - the Apollo Client will normalize the response and merge it with the data stored in cache.

The useUpdateTodoMutation hook (as well as the other hooks coming from ~/graphql/generatedTypes) is generated by GraphQL Code Generator.

Creating and deleting

When creating and deleting items, we need to write our own update functions for appending or filtering out the results according to our needs. The previously mentioned article about cache normalization - apollographql.com/blog/apollo-client/cachin.. - describes how the cache.readQuery and cache.writeQuery can be used to update the cache. They're working fine, however since version Apollo Client v3, there is an additional cache.modify method to update the data in one go.

Below there are snippets of the hooks used to encapsulate the logic for performing both operations - creation and deletion - with appropriate commentary.

import { useCreateTodoMutation } from "~/graphql/generatedTypes";

export function useCreateTodo() {
  const [createTodo] = useCreateTodoMutation({
    update(cache, { data }) {
      if (data?.createTodo) {
        cache.modify({
          fields: {
            todosBatch: (existing, { toReference }) => ({
              cursor: existing.cursor,
              todos: {
                [data.createTodo.id]: toReference(data.createTodo),
                ...existing.todos,
              },
            }),
          },
        });
      }
    },
  });

  return { createTodo };
}
import { useDeleteTodoMutation } from "~/graphql/generatedTypes";

export function useDeleteTodo() {
  const [deleteTodo] = useDeleteTodoMutation({
    update(cache, { data }, { variables }) {
      if (data?.deleteTodo && variables?.id) {
        // remove referece to item from queries in cache
        cache.modify({
          fields: {
            // remove the reference from removed Todo from the list
            todosBatch: (existing) => {
              const { [variables.id]: _, ...remainingTodos } = existing.todos;
              return {
                cursor: existing.cursor,
                todos: remainingTodos,
              };
            },
            todo: (existing, { DELETE, readField }) => {
              // if single Todo with given id was selected to fetch its details - mark it as removed
              if (readField("id", existing) === variables.id) {
                return DELETE;
              }
              return existing;
            },
          },
        });

        // garbage collector - remove all objects that are not referenced from any query - in that case - removed Todo
        cache.gc();
      }
    },
  });

  return { deleteTodo };
}

If you'd like to dive into more examples of updating local cache, I can highly recommend Understanding Caching in Apollo Client 3 talk by Laura Beatris along with her example.

Summary

That's all for today! I hope I was able to briefly show you my findings and point out some interesting resources related with Apollo Cache, working with paginated data and updating the Apollo Client's cache locally, without additional refetches. Here's the link to the finished example on GitHub. All in all, I'm pretty impressed by the idea and implementation of Cache API, the flexibility it can give us as well as the convenience of usage from the DX perspective. Moreover, I need to admit that the documentation and linked blog post and talk is really neat and it was a great guidance, while working on a proof of concept.

Lastly, I want to leave some words of caution - the presented example shows some capabilities of the tool, but it's our job to identify if these will be applicable for our use cases and won't over-engineer the feature we want to ship. I hope I'll have some time to mess around with these building blocks and verify if they'll benefit us with our real problem, taking all the constraints into consideration.

That's all from my side. I'm wondering about your experience with Apollo? Did you encounter some similar problems and how did you solve them? Maybe you know some other, better approaches that I could use in my proof of concept? I'll be more than happy to hear your opinion!

References