WSSCode Blog

Pathom 3 is coming

September 17, 2020

Hello everyone!

Welcome to this new blog of mine, and let’s start with some big news.

Pathom 3 is on the way, and its a rewrite! In this article I’ll highlight the motivations for a Pathom rewrite, and what you can expect from the project development in the following months.

Goodbye parsers and readers, Connect at center

What is the primary abstraction of Pathom?

A lot of Pathom discussion is around resolvers, and that can make you think they are the main driver, but that’s not the case.

Pathom was built on top of the Om.next idea of parsers. I wrote in length about using that method to write parsers in a previous article I wrote.

For a recap: the original design was based on scanning the AST, filling the response, one attribute at a time.

The connect idea emerged after and was included as an extension to the system.

Here is a bit of how connect evolved:

  • pc/reader - The “proof of concept” reader.

I created this reader to try the connect idea; it was used in production only for a short amount of time.

This reader is very eager; it process resolvers as it sees then, there is no planning, it can get stuck in circles and also unnecessary calls paths that lead to dead ends

  • pc/reader2 - per attribute planning.

This reader will walk the index to figure which resolvers it needs to call to reach one given attribute, considering which data is available now.

This is the planning part. This reader can plan for a single attribute; the generated plan is a sequence of resolvers to call.

When parts of the plan fail, the planner tries to calculate a new plan until there are no more possible paths.

Because its per attribute, this still fits well with the parser design. However, this also requires some mutable state to cache the resolvers’ results between multiple attributes.

  • pc/parallel-reader - parallel support, still per attribute planning

Uses the same planning algorithm of the reader2. By calculating the “provision” of a plan (which attributes will be available after the plan is complete), Pathom can spawn parallel processes to handle different query segments at the same time.

This reader internals are quite complicated due to this synchronization step that needs to happen when new data is available.

Given it still uses the parser abstraction, the code that generates the final result is unnecessarily complicated to match the designs.

Also, due to the coordination overhead, this reader tends to be more resource-consuming. It only improves the parsing speed when the user has big queries that can take advantage of parallelism.

  • pc/reader3 - multi-attribute planning, the graph plan

This is the most recent (and still under development) reader. Instead of doing a plan calculation per attribute, this new reader uses a planner that considers all the attributes simultaneously and generates an execution graph of all possibilities ahead of time.

These are some of the advantages of this approach:

  • More predictable runs
  • Greatly more efficient when dealing with dynamic resolvers. Dynamic resolvers are resolvers that have variable input/output, like GraphQL and Datomic integrations.
  • Enables the Maximal Graph vision that we will discuss later in this article.
  • The plan already exposes all parallelism opportunities with the special AND and OR nodes, making the parallel implementation much more straightforward than the previous one.

When we consider the way this reader works and trying to fit on the parser, it is just a lot of unnecessary things going on.

note

Later in the article, I’ll refer to this as Pathom 3, but note this is also available in Pathom 2.

In Pathom 3, this will be the only available planning/runner.

After pc/reader3, it became clear that the parser abstraction is more getting in a way than it’s helping these days.

And that’s what’s happening on Pathom 3, there are no more readers or parsers, and connect is now the driver of operations.

You can expect new plugin extensions to cover situations that were previously handled by readers.

In Pathom 3, the new planner plays a more significant role in the process; everything related to query scanning should happen at planning time now. For example, to decide which fields have partial data (due to sub-query) and need processing, the planner will provide this information, so the runner is simplified.

Sweetened defresolver

The new defresolver has a compatible API with the previous one, but it is also getting some “toppings” to make it sweeter:

Topping 1: Implicit inputs

Pathom now can infer the input part of a resolver from the user destructuring expression in the arglist. For example:

(defresolver full-name [env {:acme.person/keys [first-name last-name]}]
  {::pco/input  [:acme.person/first-name :acme.person/last-name]
   ::pco/output [:acme.person/full-name]}
  {:acme.person/full-name (str first-name " " last-name)})

That’s a lot of data in that meta, now with implicit inputs you can remove the input declaration and let it be infered from the destructuring, as:

(defresolver full-name [env {:acme.person/keys [first-name last-name]}]
  {::pco/output [:acme.person/full-name]}
  {:acme.person/full-name (str first-name " " last-name)})

If you provide ::pco/input, it will use it instead of inferring.

Another thing you may have noticed that inputs are now vectors (instead of sets like in Pathom 2); this is to make everything consistent, so now input, output, and params all use the EQL format.

Topping 2: Single attribute resolver

Another common pattern in Pathom is to create a resolver that outputs a single property (like our previous example), to avoid the map repetition you can use a keyword after the meta map:

(defresolver full-name [env {:acme.person/keys [first-name last-name]}]
  :acme.person/full-name (str first-name " " last-name))
note

The meta map is optional; in this example, we removed it since we don’t need anymore, but you can add it before the :acme.person/full-name to add more meta to the resolver.

Topping 3: Implicit arguments

The arglist on defresolver is also different, since in our full-name example we don’t need env, we can omit it:

(defresolver full-name [{:acme.person/keys [first-name last-name]}]
  :acme.person/full-name (str first-name " " last-name))
important

In case you use a single argument, it is always the input. To use env, you must use two arguments

Also, in case you don’t need input too (global resolvers), you can write simply as:

(defresolver pi [] :math/pi 3.14)

Resolver get a record type

In Pathom 2, the resolvers were plain Clojure Maps. I changed that to a type, which gives these new properties for them.

Resolvers are callable

In Pathom 2, to call a resolver, you had to retrieve ::pc/resolve from it and then call, now with the type, it implements IFn which means you can call the resolver as a normal function

Which means this is now a thing:

(defresolver full-name [{:acme.person/keys [first-name last-name]}]
  :acme.person/full-name (str first-name " " last-name))

(full-name 
  {:acme.person/first-name "Wilker"
   :acme.person/last-name "Lucio"})
; => {:acme.person/full-name "Wilker Lucio"}

Stop try to encode the resolve fn

Another issue with ::pc/resolve is that when the user needs to send the index across the wire, having an fn as part of the data is the most common problem I see that throws errors during the transit encoding.

The new type can implement the transit protocols to encode itself properly without leaking the resolve part.

Performance

Reduce base footprint

To test the minimum cost of doing a minimal relevant operation with Pathom, I made a setup to run just one resolver. Consider this resolver is very lightweight, so the time spent in this benchmark is how much Pathom adds in time by its internals.

The Pathom 2 the overhead seems pretty bad, but consider that most of the current users use Pathom to wrap some external source (services, databases, etc…), and these resolvers tend to be bulky, making the Pathom overhead not to be a problem when sided with it.

But with the new performance, Pathom starts to be a good option for more internal lightweight processes, like creating compatibility layers, aliases, and other general processing.

The gains come from removing the reader abstraction and simplified constructs.

You can find the code for this benchmark in this gist.

Faster load times

In Pathom 2, the connect namespace got too big and contained large macros that take long to compile. The users pay this price, even when not using any async features.

Pathom 3 considers this problem and has more distributed and smaller namespaces. Also, the async parts (yet to be done) will be in separate namespaces.

EQL Process

In Pathom 2, EQL was the only way to use the resolvers. In Pathom 3, it’s one of the options, along with Smart Maps (more on it below).

This is what it looks like to process EQL requests in Pathom 3

; define a resolver
(pco/defresolver full-name [{::keys [first-name last-name]}]
  ::full-name (str first-name " " last-name))

; generate the indexes
(def indexes (pci/register full-name))

(def ada {::first-name "Ada" ::last-name "Lovelace"})

; process request
(p.eql/process (p.ent/with-entity indexes ada) [::full-name ::first-name])
; => {::full-name "Ada Lovelace" ::first-name "ada"}

Without the parser abstraction, Pathom uses the environment directly, which is the common pattern between the interfaces.

Smart Maps

A smart map is a custom map type that uses resolvers to realize data as the program asks for the attributes.

Here is an example of what a Smart Map usage looks like:

; define a resolver
(pco/defresolver full-name [{::keys [first-name last-name]}]
  ::full-name (str first-name " " last-name))

; generate the indexes
(def indexes (pci/register full-name))

; create a smart map
(def person (psm/smart-map indexes {::first-name "Ada" ::last-name "Lovelace"}))

; use as a regular map
(::first-name person) ; => "Ada"
(keys person) ; => [::first-name ::last-name]

; ask new information
(::full-name person) ; => "Ada Lovelace"

; information is cached after read
(keys person) ; => [::first-name ::last-name ::full-name]

For more examples, check the Smart Map tests.

I’m excited to see what usages you will come up with for Smart Maps!

Dynamic Resolvers in Pathom 2

Dynamic resolvers are resolvers that have dynamic input and output. Pathom 2 has a basic support for dynamic resolvers.

But implementing a dynamic resolver in Pathom 2 requires a ton of internals understanding, and the resulting algorithm adds considerable overhead.

One common challenge is that dynamic resolvers require multi-attribute planning to be efficient.

Consider the following EQL query that uses an integrated GraphQL resolver from Github.

[{:github.user/viewer
  [:github.user/login
   :github.user/name]}]

Remember that reader2 and parallel-reader use a planner that only plans for one attribute at a time. If Pathom naively uses that, it would mean that the previous query would make two separate requests, one for login and another for the name, and that’s is not what it should do:

# :github.user/login request
query {
  viewer {
    login
  }
}

# :github.user/name request
query {
  viewer {
    name
  }
}

Instead, it should group the max number of properties it can and make a single request to the GraphQL service, as:

query {
  viewer {
    login
    name
  }
}

To do that, Pathom needs to consider multiple attributes of the same request, and the GraphQL resolver implementation does that; it looks for the parent query and finds all relevant attributes.

But just looking at the query is not enough. That’s because the query may have attributes that are not directly from the GraphQL server, but have a dependency on it. For example consider this setup:

; resolver that depends on a GraphQL attribute
(pc/defresolver github-first-name [_ {:keys [github.user/name]}]
  {::pc/input  #{:github.user/name}
   ::pc/output [::first-name]}
  {::first-name (first (str/split name #" "))})

; query
[{:github.user/viewer
  [:github.user/login
   ::first-name]}]

Note that in this example, the optimal approach still queries Github once asking for the login and name, but to realize that Pathom has to plan for every attribute once, just to find out about this dependency.

In Pathom 2, all of this is done at user code land, and to aggravate this problem, in case you have multiple resolvers like these, each one will repeat this planning operation for every attribute, which gets slow.

Dynamic Resolvers in Pathom 3

Pathom 3 adds official support for dynamic resolvers.

Now that the new planner already considers many attributes simultaneously, a lot of work of figuring what to call from dynamic resolvers is realized by planner code instead of user code.

This allows all the planning to happen in a single pass of the query.

What writing these resolvers look like is still under development; a current dynamic resolver that already leverages this planning is the Datomic integration.

You may have watched my last talk at the Conj, there I talk about the Maximal Graph idea, to connect distributed Pathom parsers.

This new planner is a start to make that vision happen, Pathom currently has some features that allow for that, but this part of the library needs more development.

OpenTracing

In Pathom 3 I’ll embrace the model of Open Tracing instead of building a custom trace stack.

No code related to this was done yet. If you have inputs on this, please ping me up!

Tooling

The tools will need some porting code, the query runner can still mostly the same, but the tracer is probably getting a new version that’s more aligned with the new planner and how Pathom 3 runs.

Migration Path

The good news here is that resolvers have a close interface from Pathom 2 to Pathom 3, and automating the conversion is easy to be made for most cases.

The trick part is for users making extensive use of readers and plugins, the plugins are likely to have different entry points in Pathom 3, and how to port is uncertain at this point.

I would love to hear some user feedback once plugins start to get shaped in Pathom 3 to think together on ways to minimize the burden of porting.

Current state

You can find Pathom 3 source code at https://github.com/wilkerlucio/pathom3.

This is a work in progress, don’t use.

If you like to collaborate in the development or play a bit with it, you can use it as a deps dependency.

Most essential features from Pathom 2 that still need to be implemented in Pathom 3:

  • Ident queries
  • Error handling
  • Mutations
  • Placeholders
  • Batching
  • Async / Parallel processing

That’s what I have for today; I’m excited to keep working on this. If you have questions or wanna chat about any of this, this is a great time to bring new ideas to Pathom; you can find me at #pathom on Clojurians.

If you like to learn more about Pathom in a daily journal, check my Roam wsscode.


Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!

Current supporters

And here I like to give a thanks to my current supporters:

Albrecht Schmidt
Alister Lee
Austin Finlinson
Daemian Mack
Jochen Bedersdorfer
Kendall Buchanan
Mark Wardle
Michael Glaesemann
Oleg, Iar, Anton
West