Pathom Updates 9, Planner Revamp.

April 18, 2021

Hello everyone, it’s time for the 9th edition of Pathom updates!

First, a recap on what’s been going on with Pathom over the last month. At the time of my last post, I was working on dynamic resolvers and expecting to release a beta soon.

But during this process, I was seeing here and there a few cases in which Pathom had all the valid resolvers but was unable to fulfill the request. At the time, I had seem a few of these before, and was fixing one case at a time, but as I was adding these patches to the planner code (most of the bugs I found were related to the Pathom 3 planner getting the graph wrong).

After some time, that code was getting more and more complex, and at the same time, I didn’t feel very confident that all the problems were solved. I guess I feel that way because the fixes that I had been doing were getting more and more specific. This is a smell that something there is no right.

Given this concern, I decided to pause on the dynamic resolvers and go investigate it down. The cases that generate bugs usually involve many attribute re-utilization inside the graph, this means that Pathom did compute the graph for some attribute, and later something else also depends on that attribute. What the planner did was trying to optimize (making the smallest graph possible) the graph as it goes.

This idea makes the planner fast since it does a single iteration, this was not a constraint of the design, but it felt nice to have when I was working on it.

Generative tests

To explore the problem, I did start writing some generative tests. I’ve wanted to do this for a long time. These generators are a very interesting topic, and I plan to post about this in more detail. For now, I can tell you the basic idea.

First, the goal of these generators is to give me a structure like this:

{:resolvers [...]
 :query     [:a :b]
 :expected  {:a "a" :b "b"}}

This data contains the resolvers we will use on the test, a query to run, and some expected output. Now, how to get those.

The strategy I made up is to generate a “solution” for an attribute. That solution may be a single resolver call or a resolver that has dependencies. In the case of a dependency, the process is recursive and can go as deep as we ask it to (there are nobs to configure how deep it can go).

All that I said was related to generating the first attribute in the query. For the rest of the attributes, it also has a chance to make a resolver that depends on the attribute generated previously, and this will create these attribute re-use situations I’m looking for.

Wonder what those cases look like?

Let’s start with a small one:

tip

The graph is interactive. You can use your mouse to view node details, rearrange nodes, pan and zoom.

Loading data...

Simple enough, let’s see something more spicy:

Loading data...

Now time to see the unique type of case that a generator can do:

note

This may take some time to start.

Loading data...

Results

These graphs should be enough to uncover any nasty re-usage problem that may be hiding in the code from the planner. Did it find problems? Yes, more than I expected…

To start, the following example has the constraints:

{::knob-max-resolver-depth     2
 ::knob-max-deps               1
 ::knob-max-request-attributes 4}

I get a fail case like this:

; query
[:a :b :c :d]

; expected result
{:a "a" :b "b" :c "c" :d "d"}

; actual result
{}

Loading data...

The Completed graph. state is blank, but you can visit the steps that happened before.

tip

If you are not familiar with the Pathom 3 planner, you can learn about it at the documentation page.

This is an example we can see that there are two node clusters and they never get connected.

They should have been connected in the step Start branch node 5 and 10 using run-and.

Another example, using the default settings:

{::knob-max-resolver-depth     10
 ::knob-max-deps               5
 ::knob-max-request-attributes 8
 ::knob-max-resolver-outputs   10
 ::knob-max-edge-options       5}

; query
[:a :b]

; expected result
{:a "a" :b "b"}

; actual result
{:a "a"}

Loading data...

As you can see, again, some nodes end up not connected. In this case, I can pinpoint the problem starting at the step Created new branch node for 2 and 4. In that case, it should have merged with the AND node 7, but instead, it created the node 11.

If I increase some of those numbers, I can consistently find broken cases. The point here is that this is not working fine. Especially considering that I’m only these generators are only touching the most basic feature parts.

This is a good example of how generative tests can uncover things that are hard to imagine when dealing with complex algorithms. It makes me remember this article I was reading recently:

https://www.pixelated-noise.com/blog/2020/09/10/what-spec-is/#org2814ae4

Deoptimize

One thing I noticed about these tests is that the algorithm always fails while doing some “node merging” operation, and in principle, all these operations are related to optimizations.

Considering that (and after chatting with Tony Kay) I decided to try to remove all this node collapsing logic and make the algorithm expand the graph possibilities in the simplest way possible, even if it generates duplicated items. Time to do some rewriting.

After I simplified the code and made it do the “graph expansion” as dummy as possible, all tests started to pass!

Even better, I was able to evolve the generative test more, including things like OR cases and a few failure modes, and the tests still pass consistently!

This is what happened to the planner file after the changes:

note

You can find the complete changes at this pull request.

This made the planner code much simpler to understand for this part. Looking at it now, having this unoptimized build-up gives a few advantages:

Now, the planner can see the full picture about why each node exists, and for each context, this will make it easier to optimize the nodes later because of the complete information
This unoptimized graph has all the attribute relevant information for sub-query processing (useful for dynamic resolvers).
Computing this version is faster, given the operations are simpler now, before, depending on the node case, it had to do graph traversals which could consume extra time

Not time to say its perfect, of course, but I feel much more confident about its ability to get a correct result 🎉.

note

The generated graphs I presented to you earlier, they are all using the unoptimized algorithm, given the previous algorithm would fail to compute these examples.

Tradeoffs

To give you a sense of what an unoptimized graph is like, let’s compare the plan for the same request, with the previous optimized and the new one.

This is the query:

(p.eql/process env {:listing/json-path "some-file.json"}
  [:geo.point/lat
   :geo.point/lon
   :listing/rental-price-total-month
   :listing/usable-area
   :listing/address-string])

Using the previous optimized algorithm, we end up with this graph:

Loading data...

This graph is tight. It contains only the minimum paths that will fulfill the request.

Now let’s check the result on the new unoptimized graph for the same request:

Loading data...

Not so tight anymore, but more predictable. You can see the root AND has 5 connections, one path for each of the attributes required at the root.

The first two clusters of OR, you can see they have the same exact resolvers (which makes sense, they are asking for latitude and longitude, which are available though the same paths in this index).

Scan around, see which other duplications you can find around.

One other thing to notice is that, even though a node may have many duplicates around, only one of them will be green. By the time Pathom reaches the node a second time, it will see that the data required is already available and will skip that resolver invocation.

I believe the extra overhead for these checks is negligible for most cases.

So, why the optimizations existed in the first place?

One of the main reasons is to handle dynamic resolvers. As we saw in the examples, a repeated static resolver will add some overhead to process, but just a little, given every execution of a static resolver is the same, a second visit of it can be skipped.

The same situation is different in the case of a dynamic resolver. Because dynamic resolvers have dynamic input and output. Failing to “compact” them will cause multiple calls to the resolver, which can quickly become inefficient.

Imagine, for example, a dynamic GraphQL integration, using the unoptimized graph, Pathom would make one different request to GraphQL for each different attribute. The whole purpose of dynamic resolvers is to do this “lateral batching” and make the least number of calls possible for each dynamic resolver.

Other things is error handling. To detect errors, Pathom has to do some navigation around the graph (to check, for example, if an error happened at a parent part of the process). In the case of the unoptimized graph, it has to do many times.

Another subtle issue with the unoptimized path is that for the same cluster, Pathom may choose different paths. You can see this is that last example as well. On the first OR cluster at the top, it chooses the top path (with just two nodes), but on the second one, it chooses the second (which is that cluster starting with an AND node).

The code for the unoptimized planning is merged on master, and given that Pathom 3 currently doesn’t have support for dynamic resolvers. It shouldn’t be a big impact on current applications.

Optimizations v2

Instead of optimizing at every step like it did before, the next strategy is to optimize after. I believe this way, the algorithms will be easier to work, given it starts with full information (instead of the “merged” one). Also, separating the algorithm into steps like this make it easier to understand.

That’s what I have for today people. If you have any issues with the new planner please open an issue.

Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!