Pathom Updates 7, Pathom 3 goes Async

January 20, 2021

Welcome to one more edition of Pathom updates!

Recursive queries

To start, I like to talk about recursive queries.

This important feature was missing until recently on Pathom 3, but not anymore!

You can know more about this feature at this documentation page.

Hacker News scraper tutorial

This is a recent addition to the tutorials on the documentation site.

In this tutorial, I model the Hacker News data with Pathom. For the implementation, I did a scraping strategy, extracting data from the HTML.

This tutorial is medium size and touches some many aspects of Pathom. If you like to learn though building (which is one of the most effective ways IMO), check it out!

Async support

Pathom now has a new runner implementation that supports resolvers to use async processes.

In the async runner, when a resolver or mutation returns a future-like thing, Pathom will wait for that future to realize before moving on.

For the underlying implementation, Pathom is using Promesa. Promesa is fast and uses good native primitives under the hood: on the JVM it uses CompletableFuture, and in the JS it uses Promises.

I’m quite happy with the performance of it (benchmarks down below in this article).

It’s also extensible. I documented how to extend it to support core.async channels instead of futures, you can find this at the async documentation.

Benchmarks

In Pathom 2, I used core.async as the primary building block for the async support.

When I started the Pathom 3 async support, I did the same. After measuring the performance, it wasn’t that good, the overhead to process the same sync items using the new async runner was considerably slower.

So I decided to give a second take to it, and try something else, which was Promesa.

With Promesa, I got a performance very close to the serial!

Here are the benchmark results, they also include Pathom 2 runners:

note

All those tests were done in the JVM, using Criterium to measure the executions.

You can find the title for each benchmark below the bars.

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	0.009ms	1.000x
Pathom 3 Serial	0.042ms	4.982x
Pathom 2 Serial	0.057ms	6.715x
Pathom 2 Async	0.108ms	12.714x
Pathom 2 Parallel	0.145ms	17.048x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	0.013ms	1.000x
Pathom 3 Serial	0.028ms	2.047x
Pathom 2 Serial	0.574ms	42.605x
Pathom 2 Async	1.086ms	80.521x
Pathom 2 Parallel	1.682ms	124.760x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	19.404ms	1.340x
Pathom 3 Serial	14.479ms	1.000x
Pathom 2 Serial	111.461ms	7.698x
Pathom 2 Async	228.621ms	15.790x
Pathom 2 Parallel	123.812ms	8.551x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	19.699ms	1.025x
Pathom 3 Serial	19.219ms	1.000x
Pathom 2 Serial	114.969ms	5.982x
Pathom 2 Async	236.170ms	12.289x
Pathom 2 Parallel	69.532ms	3.618x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	18.412ms	1.000x
Pathom 3 Serial	19.476ms	1.058x
Pathom 2 Serial	149.716ms	8.131x
Pathom 2 Async	281.580ms	15.293x
Pathom 2 Parallel	124.575ms	6.766x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	17.396ms	1.000x
Pathom 3 Serial	18.194ms	1.046x
Pathom 2 Serial	139.458ms	8.017x
Pathom 2 Async	280.384ms	16.117x
Pathom 2 Parallel	141.067ms	8.109x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	21.772ms	1.000x
Pathom 3 Serial	22.292ms	1.024x
Pathom 2 Serial	140.549ms	6.456x
Pathom 2 Async	308.395ms	14.165x
Pathom 2 Parallel	97.979ms	4.500x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	208.209ms	1.000x
Pathom 3 Serial	211.243ms	1.015x
Pathom 2 Serial	220.532ms	1.059x
Pathom 2 Async	240.124ms	1.153x
Pathom 2 Parallel	214.609ms	1.031x

Runner	Mean	Variance
Pathom 3 Serial Cached Plan	29.681ms	1.000x
Pathom 3 Serial	31.858ms	1.073x
Pathom 2 Serial	300.165ms	10.113x
Pathom 2 Async	327.642ms	11.039x
Pathom 2 Parallel	87.656ms	2.953x

I don’t think this is a signal that core.async is slow. The way I’m using core.async probably has a big impact. Since core.async doesn’t have an error propagation method built-in, I have to create my constructs. This means I have to check for errors at each channel read, adding overhead.

I think core.async is just not appropriated for the usage at hand. A future based mechanism ended up suiting this situation better.

Parallel support (not available yet)

Parallel support isn’t available yet, but it will probably come as an extension to this same async runner.

Making it do some blind parallelism is easy. For example, during the collection process I could trigger all the items at the same time.

To make a non-naive implementation of parallel support also requires the implementation of resource management. I want to allow users to configure things like:

How many items should run in parallel for a given sequence?
How many “operations” can a single request do in parallel
Configure thread pools for parallel process

In Pathom 2, the parallel process required a lot of overhead. Due to structural changes in Pathom 3, this is likely to be different this time.

Another big difference is that in Pathom 2 the runner had to recalculate paths multiple times when things go wrong. The new planner already knows every possible path ahead of time. The new runner implementation for parallel will be much simpler because of this pre-work from the planner.

Once those are there, the same current code can run in parallel!

Porting repl-tooling

Once I got the basics of async working, I wanted to use it in some real applications.

Luck for me there is the repl-tooling used by Chlorine editor.

Mauricio Szabo used Pathom to compute editor related information, and there are some interesting, complex dependencies in this process. At the same time, the code is small, which makes a great candidate for the porting experiment.

note

This is a good example of using Pathom outside the API realm. The task of Pathom is to handle data realization via declarative attribute relationships. This is a property you can leverage in any domain you are working with!

Porting the code was easy. Repl-tooling isn’t using any plugins or fancy things. In the process I did simplify the code by using the implicit inputs feature.

The other change was on the interface edge, to replace the parser usage with the new EQL async processor.

Then I ran the tests. Almost all of them failed.

The exercise was cool. With some debug, I figured some issues with the planner and the runner. Over the weekend, those got fixed, so if you are using Pathom 3, be sure to upgrade to get those fixes.

Some problems were easy to fix, like the code that filters the output on the EQL, it was losing the record types when they were present in the data. Or a problem with lists getting reversed.

tip

One of the bugs was a consequence of a bad assumption. I assumed the following code would always output the same collection in the end:

(into (empty coll) coll)

It turns out this is not true. In the case of lists, the output has the items in reversed order.

Most of the time, I show here some small graphs that I use for testing. This time I have the opportunity to show you one from a real application and this is what it looks like:

The Pathom 3 algorithm is a new thing. For that reason, I still expect to find some bugs like this. With a crescent number of tests, I hope the issues will lower in frequency.

Next step: tooling

Next I’ll work on tooling!

I plan to extend the current Pathom Viz app also to support Pathom 3. This will make the same app work with both versions.

I think I can re-use the query editor, the index is almost the same, and this should make the porting easy.

The tracer, I’m not sure yet. The process of Pathom 3 is too different from Pathom 2, and I need to do some experimentation to see if I can re-use, or if that needs to be something new. I indeed believe a complete view of the query like the tracer is essential to enable query debugging at a glance.

There is also the new graph visualization that I’ve shown in some posts here. This view is likely to be integrated into the timeline view to inspect the graph per execution entity.

These are the new challenges. If you like to discuss any of these things, reach out at #pathom on Clojurians Slack.

That’s it for today, see you!

Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!

Current supporters

And here I like to give a thanks to my current supporters:

Adam Feldman

Albrecht Schmidt

Alister Lee

Austin Finlinson

Daemian Mack

Jochen Bedersdorfer

Kendall Buchanan

Mark Wardle

Michael Glaesemann

Oleg, Iar, Anton

Paulo Feodrippe

West