Pathom Updates 7, Pathom 3 goes Async
January 20, 2021
Welcome to one more edition of Pathom updates!
Recursive queries
To start, I like to talk about recursive queries.
This important feature was missing until recently on Pathom 3, but not anymore!
You can know more about this feature at this documentation page.
Hacker News scraper tutorial
This is a recent addition to the tutorials on the documentation site.
In this tutorial, I model the Hacker News data with Pathom. For the implementation, I did a scraping strategy, extracting data from the HTML.
This tutorial is medium size and touches some many aspects of Pathom. If you like to learn though building (which is one of the most effective ways IMO), check it out!
Async support
Pathom now has a new runner implementation that supports resolvers to use async processes.
In the async runner, when a resolver or mutation returns a future-like
thing, Pathom
will wait for that future to realize before moving on.
For the underlying implementation, Pathom is using Promesa. Promesa is fast and uses
good native primitives under the hood: on the JVM
it uses CompletableFuture
, and in the
JS
it uses Promises
.
I’m quite happy with the performance of it (benchmarks down below in this article).
It’s also extensible. I documented how to extend it to support core.async
channels
instead of futures
, you can find this at the async documentation.
Benchmarks
In Pathom 2, I used core.async as the primary building block for the async support.
When I started the Pathom 3 async support, I did the same. After measuring the performance, it wasn’t that good, the overhead to process the same sync items using the new async runner was considerably slower.
So I decided to give a second take to it, and try something else, which was Promesa.
With Promesa, I got a performance very close to the serial!
Here are the benchmark results, they also include Pathom 2 runners:
note
All those tests were done in the JVM, using Criterium to measure the executions.
You can find the title for each benchmark below the bars.
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 0.009ms | 1.000x |
Pathom 3 Serial | 0.042ms | 4.982x |
Pathom 2 Serial | 0.057ms | 6.715x |
Pathom 2 Async | 0.108ms | 12.714x |
Pathom 2 Parallel | 0.145ms | 17.048x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 0.013ms | 1.000x |
Pathom 3 Serial | 0.028ms | 2.047x |
Pathom 2 Serial | 0.574ms | 42.605x |
Pathom 2 Async | 1.086ms | 80.521x |
Pathom 2 Parallel | 1.682ms | 124.760x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 19.404ms | 1.340x |
Pathom 3 Serial | 14.479ms | 1.000x |
Pathom 2 Serial | 111.461ms | 7.698x |
Pathom 2 Async | 228.621ms | 15.790x |
Pathom 2 Parallel | 123.812ms | 8.551x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 19.699ms | 1.025x |
Pathom 3 Serial | 19.219ms | 1.000x |
Pathom 2 Serial | 114.969ms | 5.982x |
Pathom 2 Async | 236.170ms | 12.289x |
Pathom 2 Parallel | 69.532ms | 3.618x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 18.412ms | 1.000x |
Pathom 3 Serial | 19.476ms | 1.058x |
Pathom 2 Serial | 149.716ms | 8.131x |
Pathom 2 Async | 281.580ms | 15.293x |
Pathom 2 Parallel | 124.575ms | 6.766x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 17.396ms | 1.000x |
Pathom 3 Serial | 18.194ms | 1.046x |
Pathom 2 Serial | 139.458ms | 8.017x |
Pathom 2 Async | 280.384ms | 16.117x |
Pathom 2 Parallel | 141.067ms | 8.109x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 21.772ms | 1.000x |
Pathom 3 Serial | 22.292ms | 1.024x |
Pathom 2 Serial | 140.549ms | 6.456x |
Pathom 2 Async | 308.395ms | 14.165x |
Pathom 2 Parallel | 97.979ms | 4.500x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 208.209ms | 1.000x |
Pathom 3 Serial | 211.243ms | 1.015x |
Pathom 2 Serial | 220.532ms | 1.059x |
Pathom 2 Async | 240.124ms | 1.153x |
Pathom 2 Parallel | 214.609ms | 1.031x |
Runner | Mean | Variance |
---|---|---|
Pathom 3 Serial Cached Plan | 29.681ms | 1.000x |
Pathom 3 Serial | 31.858ms | 1.073x |
Pathom 2 Serial | 300.165ms | 10.113x |
Pathom 2 Async | 327.642ms | 11.039x |
Pathom 2 Parallel | 87.656ms | 2.953x |
I don’t think this is a signal that core.async
is slow. The way I’m using core.async
probably has a big impact. Since core.async doesn’t have an error propagation method
built-in, I have to create my constructs. This means I have to check for
errors at each channel read, adding overhead.
I think core.async
is just not appropriated for the usage at hand. A future
based mechanism ended up suiting this situation better.
Parallel support (not available yet)
Parallel support isn’t available yet, but it will probably come as an extension to this same async runner.
Making it do some blind parallelism is easy. For example, during the collection process I could trigger all the items at the same time.
To make a non-naive implementation of parallel support also requires the implementation of resource management. I want to allow users to configure things like:
- How many items should run in parallel for a given sequence?
- How many “operations” can a single request do in parallel
- Configure thread pools for parallel process
In Pathom 2, the parallel process required a lot of overhead. Due to structural changes in Pathom 3, this is likely to be different this time.
Another big difference is that in Pathom 2 the runner had to recalculate paths multiple times when things go wrong. The new planner already knows every possible path ahead of time. The new runner implementation for parallel will be much simpler because of this pre-work from the planner.
Once those are there, the same current code can run in parallel!
Porting repl-tooling
Once I got the basics of async working, I wanted to use it in some real applications.
Luck for me there is the repl-tooling used by Chlorine editor.
Mauricio Szabo used Pathom to compute editor related information, and there are some interesting, complex dependencies in this process. At the same time, the code is small, which makes a great candidate for the porting experiment.
note
This is a good example of using Pathom outside the API realm. The task of Pathom is to handle data realization via declarative attribute relationships. This is a property you can leverage in any domain you are working with!
Porting the code was easy. Repl-tooling isn’t using any plugins or fancy things. In the process I did simplify the code by using the implicit inputs feature.
The other change was on the interface edge, to replace the parser usage with the new EQL async processor.
Then I ran the tests. Almost all of them failed.
The exercise was cool. With some debug, I figured some issues with the planner and the runner. Over the weekend, those got fixed, so if you are using Pathom 3, be sure to upgrade to get those fixes.
Some problems were easy to fix, like the code that filters the output on the EQL, it was losing the record types when they were present in the data. Or a problem with lists getting reversed.
tip
One of the bugs was a consequence of a bad assumption. I assumed the following code would always output the same collection in the end:
(into (empty coll) coll)
It turns out this is not true. In the case of lists, the output has the items in reversed order.
Most of the time, I show here some small graphs that I use for testing. This time I have the opportunity to show you one from a real application and this is what it looks like:
The Pathom 3 algorithm is a new thing. For that reason, I still expect to find some bugs like this. With a crescent number of tests, I hope the issues will lower in frequency.
Next step: tooling
Next I’ll work on tooling!
I plan to extend the current Pathom Viz app also to support Pathom 3. This will make the same app work with both versions.
I think I can re-use the query editor, the index is almost the same, and this should make the porting easy.
The tracer, I’m not sure yet. The process of Pathom 3 is too different from Pathom 2, and I need to do some experimentation to see if I can re-use, or if that needs to be something new. I indeed believe a complete view of the query like the tracer is essential to enable query debugging at a glance.
There is also the new graph visualization that I’ve shown in some posts here. This view is likely to be integrated into the timeline view to inspect the graph per execution entity.
These are the new challenges. If you like to discuss any of these things, reach out at #pathom on Clojurians Slack.
That’s it for today, see you!
Follow closer
If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.
Support my work
I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!
Current supporters
And here I like to give a thanks to my current supporters: