Hacker Newsnew | past | comments | ask | show | jobs | submit | kmicinski's commentslogin

This is very cool to hear. Please get in touch with me, I would love to learn more. By the way, I am recruiting participants for an upcoming seminar in which I am soliciting industrial participation: https://kmicinski.com/minnowbrook-26. Please get in contact with me if this is relevant to anyone at your company.

I am the author of this paper, and I do not agree with Dr. Smaradgakis' comments. As far as I can tell, the root of his concern is that that paper did not target Souffle Datalog, a specific Datalog language in which his group writes. The criticism is totally fair in a sense, but I do not agree with you that these are "pretty big caveats" in our paper, for the reasons I address in my rebuttal to his comment. I will say however, that his very engaging comments have pushed us to do significant follow-on work, which has now pushed our engines to scale to the kind of code he writes in Datalog, yielding very exciting results, and I am hoping that he will be satisfies when he sees it :-)

I will also mention that our group has follow-on work from this (I cannot share this widely due to reviewing reasons but a preprint is available if you would like to search) which significantly addresses Yiannais' concerns. In the engine cited here, we scale to small programs (tens of lines): our engine does not support large, tricky queries for interesting, asymptotic reasons (which are also shared by other Datalog engines based upon binary joins, not unique to our engines). Our new engines port a significantly more complex class of join algorithms to the GPU, and we have used these new algorithms (and our novel GPU-based implementation) to run 500-1000-line Datalog programs which beat all existing state-of-the-art program analysis engines by 20-50x.

In sum, I strongly disagree with the "pretty big caveats" remark. Dr. Smaradgakis' comments are quite firm in nature and I very much respect them. But I encourage you to check out my rebuttal and also (regarding scaling to larger subsets of Datalog and "real" programs) our recent follow-on work.

If you would like proof, please email me, we are happy to help you evaluate for yourself. My email is always open: kkmicins@syr.edu.


> In an introduction course there is no need to aim for maximum performance if parsing a 10k lines program takes less than a second.

I'll do you one better. The main compiler in my course uses only six characters to parse every single project: `(read)`.


Respectfully, I think what you mean is that there are no books which give you the experience of hacking on LLVM for several years.


> - I had worse performance with Soufflé than Ascent for my program for some query-planning reason that I couldn't figure out. I don't really know why; see https://github.com/souffle-lang/souffle/discussions/2557

I think the basic issue is that ADTs are simply not indexed--so to the degree that you write a query that would necessitate an index on a subtree of an ADT, you will face asymptotic blowup, as the way ADTs work will force you to scan-then-test across all ADTs (associated with that top-level tag). The issue is discussed in Section 5.2 of this paper here: https://arxiv.org/pdf/2411.14330


Ah, yes, but I think Ascent also doesn't index ADTs. In this case, based on some other information, it seems like Soufflé _can_ plan the queries better if it has profiling data. It seems like Ascent just happened to pick a better query plan in my case without the profiling data.

Thanks for the link to the paper!


It's true that Ascent does not index ADTs either, but there are some tricks that you can use when you control the container type to get similar performance by, e.g., storing a pre-computed hash. I believe Arash, the main author of Ascent, was exploiting this trick for Rc<...> members and seeing good performance gains. It is a bit nuanced, you're right that Ascent doesn't pervasively index ADTs out of the box for sure.


I appreciate your thoughtful criticism of the post, to my eyes everything you are saying is correct.


Seems potentially interesting to explore what would be required to store durable continuations. Feels very related to incrementalization and provenance, as you can see materializing a continuation to disk (whatever storage backend) requiring dependence tracking to do anything other than simply snapshotting the whole program state. I am just spitballing though, not sure if anyone has actually tried this.


I agree with you--that's a topic I will definitely cover in my blog, too. You make a good point: I know some folks who worked at big financial orgs, writing hundreds of thousands of lines of code, and never wrote general-recursive functions (only used simple recursors like foldl).


For materialization-heavy workloads (program analysis, etc.), we often find that optimized binary join plans (e.g., profile-optimized, hand-optimized, etc.) beat worst-case optimal plans due to the ability to get better scalability (less locking) without the need to use a trie-based representation. Within the space of worst-case optimal plans, there are still lots of choices: but a bad worst-case optimal plan can often beat a bad (randomly-chosen) binary plan. And of course (the whole point of this exercise), there are some queries where every binary plan explodes and you do need WCOJ. There's also some work on making more traditional binary joins robust (https://db.in.tum.de/people/sites/birler/papers/diamond.pdf), among other interesting work (https://arxiv.org/html/2502.15181v1). Effectively parallelizing WCOJs is still an open problem as far as I am aware (at least, this is what folks working on it tell me), but there are some exciting potential directions in tackling that that several folks are working on I believe.


No offense, but I wouldn't take Datalog 2.0's small attendance as an exemplar of Datalog's decline, even if I agree with that high-level point. Datalog 2.0 is a satellite workshop of LPNMR, a relatively-unknown European conference that was randomly held in Dallas. I myself attended Datalog 2.0 and also felt the event felt relatively sparse. I also had a paper (not my primary work, the first author is the real wizard of course :-) at the workshop. I myself saw relatively few folks in that space even attending that event--with the notable exception of some European folks (e.g., introducing the Nemo solver).

All of this is to say, I think Datalog 2.0's sparse attendance this year may be more indicative of the fact that it is a satellite workshop of an already-lesser-prestigious conference (itself not even the main event! That was ICLP!) rather than a lack of Datalog implementation excitement.

For what it's worth, none of what I'm saying is meant to rebut your high-level point that there is little novelty left in implementing raw Datalog engines. Of course I agree, the research space has moved far beyond that (arguably it did a while ago) and into more exotic problems involving things like streaming (HydroFlow), choice (Dusa), things that get closer to the general chase (e.g., Egglog's chase engine), etc. I don't think anyone disagrees that vanilla Datalog is boring, it's just that monotonic, chain-forward saturation (Horn clauses!) are a rich baseline with a well-understood engineering landscape (esp in the high-performance space) to build out more interesting theories (semirings, Z-sets, etc..).


The USA is a hostile environment to visit, for Europeans. It's a theme all over western and northern Europe to avoid it.

If you were expecting European attendance, that would explain why it's different from the past.

It's the same in science circles, tourism and even business travel.


As a European, I'm definitely avoiding the US. I was asked whether I wanted to visit our US branch and I declined - which was simply accepted and probably expected in advance. Almost everyone I know does the same.


I completely agree with you, but this event was held in October, 2024--Trump's horrendous behavior wasn't an effect on that. There are other reasons folks internationally probably don't want to travel to Texas aside from Trump--but in this specific case I think it might have more to do with the fact that LPMNR is a traditionally European conference randomly held in the US.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: