WSSCode Blog

Pathom Updates 7, Pathom 3 goes Async

January 20, 2021

Welcome to one more edition of Pathom updates!

Recursive queries

To start, I like to talk about recursive queries.

This important feature was missing until recently on Pathom 3, but not anymore!

You can know more about this feature at this documentation page.

Hacker News scraper tutorial

This is a recent addition to the tutorials on the documentation site.

In this tutorial, I model the Hacker News data with Pathom. For the implementation, I did a scraping strategy, extracting data from the HTML.

This tutorial is medium size and touches some many aspects of Pathom. If you like to learn though building (which is one of the most effective ways IMO), check it out!

Async support

Pathom now has a new runner implementation that supports resolvers to use async processes.

In the async runner, when a resolver or mutation returns a future-like thing, Pathom will wait for that future to realize before moving on.

For the underlying implementation, Pathom is using Promesa. Promesa is fast and uses good native primitives under the hood: on the JVM it uses CompletableFuture, and in the JS it uses Promises.

I’m quite happy with the performance of it (benchmarks down below in this article).

It’s also extensible. I documented how to extend it to support core.async channels instead of futures, you can find this at the async documentation.

Benchmarks

In Pathom 2, I used core.async as the primary building block for the async support.

When I started the Pathom 3 async support, I did the same. After measuring the performance, it wasn’t that good, the overhead to process the same sync items using the new async runner was considerably slower.

So I decided to give a second take to it, and try something else, which was Promesa.

With Promesa, I got a performance very close to the serial!

Here are the benchmark results, they also include Pathom 2 runners:

note

All those tests were done in the JVM, using Criterium to measure the executions.

You can find the title for each benchmark below the bars.

RunnerMeanVariance
Pathom 3 Cached Plan0.009ms0.000x
Pathom 30.042ms3.982x
Pathom 3 Async Promesa0.046ms4.430x
Pathom 3 Core Async0.100ms10.735x
Pathom 2 Serial0.057ms5.715x
Pathom 2 Async0.108ms11.714x
Pathom 2 Parallel0.145ms16.048x
RunnerMeanVariance
Pathom 3 Cached Plan0.013ms0.000x
Pathom 30.028ms1.047x
Pathom 3 Async Promesa0.038ms1.783x
Pathom 3 Core Async0.194ms13.419x
Pathom 2 Serial0.574ms41.605x
Pathom 2 Async1.086ms79.521x
Pathom 2 Parallel1.682ms123.760x
RunnerMeanVariance
Pathom 3 Cached Plan19.404ms0.340x
Pathom 314.479ms0.000x
Pathom 3 Async Promesa20.759ms0.434x
Pathom 3 Core Async157.938ms9.908x
Pathom 2 Serial111.461ms6.698x
Pathom 2 Async228.621ms14.790x
Pathom 2 Parallel123.812ms7.551x
RunnerMeanVariance
Pathom 3 Cached Plan19.699ms0.025x
Pathom 319.219ms0.000x
Pathom 3 Async Promesa25.647ms0.335x
Pathom 3 Core Async146.294ms6.612x
Pathom 2 Serial114.969ms4.982x
Pathom 2 Async236.170ms11.289x
Pathom 2 Parallel69.532ms2.618x
RunnerMeanVariance
Pathom 3 Cached Plan18.412ms0.000x
Pathom 319.476ms0.058x
Pathom 3 Async Promesa26.598ms0.445x
Pathom 3 Core Async206.131ms10.196x
Pathom 2 Serial149.716ms7.131x
Pathom 2 Async281.580ms14.293x
Pathom 2 Parallel124.575ms5.766x
RunnerMeanVariance
Pathom 3 Cached Plan17.396ms0.000x
Pathom 318.194ms0.046x
Pathom 3 Async Promesa25.284ms0.453x
Pathom 3 Core Async211.923ms11.182x
Pathom 2 Serial139.458ms7.017x
Pathom 2 Async280.384ms15.117x
Pathom 2 Parallel141.067ms7.109x
RunnerMeanVariance
Pathom 3 Cached Plan21.772ms0.000x
Pathom 322.292ms0.024x
Pathom 3 Async Promesa30.473ms0.400x
Pathom 3 Core Async173.799ms6.983x
Pathom 2 Serial140.549ms5.456x
Pathom 2 Async308.395ms13.165x
Pathom 2 Parallel97.979ms3.500x
RunnerMeanVariance
Pathom 3 Cached Plan208.209ms0.000x
Pathom 3211.243ms0.015x
Pathom 3 Async Promesa210.187ms0.010x
Pathom 3 Core Async227.414ms0.092x
Pathom 2 Serial220.532ms0.059x
Pathom 2 Async240.124ms0.153x
Pathom 2 Parallel214.609ms0.031x
RunnerMeanVariance
Pathom 3 Cached Plan29.681ms0.000x
Pathom 331.858ms0.073x
Pathom 3 Async Promesa31.788ms0.071x
Pathom 3 Core Async44.584ms0.502x
Pathom 2 Serial300.165ms9.113x
Pathom 2 Async327.642ms10.039x
Pathom 2 Parallel87.656ms1.953x

I don’t think this is a signal that core.async is slow. The way I’m using core.async probably has a big impact. Since core.async doesn’t have an error propagation method built-in, I have to create my constructs. This means I have to check for errors at each channel read, adding overhead.

I think core.async is just not appropriated for the usage at hand. A future based mechanism ended up suiting this situation better.

Parallel support (not available yet)

Parallel support isn’t available yet, but it will probably come as an extension to this same async runner.

Making it do some blind parallelism is easy. For example, during the collection process I could trigger all the items at the same time.

To make a non-naive implementation of parallel support also requires the implementation of resource management. I want to allow users to configure things like:

  • How many items should run in parallel for a given sequence?
  • How many “operations” can a single request do in parallel
  • Configure thread pools for parallel process

In Pathom 2, the parallel process required a lot of overhead. Due to structural changes in Pathom 3, this is likely to be different this time.

Another big difference is that in Pathom 2 the runner had to recalculate paths multiple times when things go wrong. The new planner already knows every possible path ahead of time. The new runner implementation for parallel will be much simpler because of this pre-work from the planner.

Once those are there, the same current code can run in parallel!

Porting repl-tooling

Once I got the basics of async working, I wanted to use it in some real applications.

Luck for me there is the repl-tooling used by Chlorine editor.

Mauricio Szabo used Pathom to compute editor related information, and there are some interesting, complex dependencies in this process. At the same time, the code is small, which makes a great candidate for the porting experiment.

note

This is a good example of using Pathom outside the API realm. The task of Pathom is to handle data realization via declarative attribute relationships. This is a property you can leverage in any domain you are working with!

Porting the code was easy. Repl-tooling isn’t using any plugins or fancy things. In the the process I did simplify the code by using the implicit inputs feature.

The other change was on the interface edge, to replace the parser usage with the new EQL async processor.

Then I ran the tests. Almost all of them failed.

The exercise was cool. With some debug, I figured some issues with the planner and the runner. Over the weekend, those got fixed, so if you are using Pathom 3, be sure to upgrade to get those fixes.

Some problems were easy to fix, like the code that filters the output on the EQL, it was losing the record types when they were present in the data. Or a problem with lists getting reversed.

tip

One of the bugs was a consequence of a bad assumption. I assumed the following code would always output the same collection in the end:

(into (empty coll) coll)

It turns out this is not true. In the case of lists, the output has the items in reversed order.

Most of the time, I show here some small graphs that I use for testing. This time I have the opportunity to show you one from a real application and this is what it looks like:

Chlorine Graph

The Pathom 3 algorithm is a new thing. For that reason, I still expect to find some bugs like this. With a crescent number of tests, I hope the issues will lower in frequency.

Next step: tooling

Next I’ll work on tooling!

I plan to extend the current Pathom Viz app also to support Pathom 3. This will make the same app work with both versions.

I think I can re-use the query editor, the index is almost the same, and this should make the porting easy.

The tracer, I’m not sure yet. The process of Pathom 3 is too different from Pathom 2, and I need to do some experimentation to see if I can re-use, or if that needs to be something new. I indeed believe a complete view of the query like the tracer is essential to enable query debugging at a glance.

There is also the new graph visualization that I’ve shown in some posts here. This view is likely to be integrated into the timeline view to inspect the graph per execution entity.

These are the new challenges. If you like to discuss any of these things, reach out at #pathom on Clojurians Slack.

That’s it for today, see you!


Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!

Current supporters

And here I like to give a thanks to my current supporters:

Albrecht Schmidt
Austin Finlinson
Daemian Mack
Kendall Buchanan
Mark Wardle
Michael Glaesemann
Oleg, Iar, Anton
West