Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composition Feature #44

Open
JoranHonig opened this issue Jun 8, 2024 · 11 comments
Open

Composition Feature #44

JoranHonig opened this issue Jun 8, 2024 · 11 comments

Comments

@JoranHonig
Copy link

I've browsed a bit through the available docs, but couldn't find whether this is possible.

So this is a question / maybe feature request.

Is there some way of composing multiple sets of predicates? Ideally you'd be able to multiple blocks of interdependent ascent code spread over multiple files.

@regexident
Copy link
Contributor

I too would like to be able to share Datalog code between ascent programs!

Right now there is a strong tension between keeping one's code modular, clean and by that testable, and the lack of support for code-sharing.

But given ascent's nature of being a compile-time proc-macro that generates a very specific Datalog runtime tailored to the very specific program that was passed to ascent! { … } I can't see how ascent could possibly be composed from multiple files, since the proc-macro only sees the scope that gets passed to it. Other than by use of more macros that is.

That said I was hoping to be able to work around this limitation by the use of Rust macros along the lines of this, which would allow one to share a "library" of common datalog snippets between multiple ascent!{ … } blocks:

macro_rules! node_relation {
    ($node:ty) => {
        relation node($node);
    }
}

macro_rules! edge_relation {
    ($node:ty) => {
        relation edge($node, $node);
    }
}

macro_rules! reachable_relation {
    ($node:ty) => {
        relation reachable($node, $node);

        // The transitive closure can be written using a so-called linear rule:
        reachable(x, y) <-- edge(x, y);
        reachable(x, z) <-- reachable(x, y), edge(y, z);
    }
}

ascent! {
    node_relation!(); // Error: "undefined macro"
    edge_relation!(); // Error: "undefined macro"
    reachable_relation!(); // Error: "undefined macro"

    relation closure_of_a(Node);

    closure_of_a(y) <-- reachable(Node("A"), y);
};

… but unfortunately this throws an "undefined macro" error from the ascent_macro's rule_expand_macro_invocations(…) function.

If ascent allowed the use of Rust macros at the rule-level, then the above should work, which admittedly isn't optimal, but at least it works.

@StarGazerM
Copy link

Rust macro are outside-in. If you call a macro inside ascent!, rust expander will first expand ascent! and then insider, if your macro code is to generate ascent code, since its expansion is after ascent compilation, it really can't work in the way you want.

If rust has some preprocessor, this will be simpler. If not, you will have to implement like writing another procedure macro to generate ascent code, instead of using normal macro.

@regexident
Copy link
Contributor

Rust macro are outside-in. If you call a macro inside ascent!, rust expander will first expand ascent! and then insider, if your macro code is to generate ascent code, since its expansion is after ascent compilation, it really can't work in the way you want.

Shoot, you're right of course. Well, that feels like a nail in the coffin for reusability in ascent then, I suppose, given that ascent generates a bespoke runtime from the provided Datalog syntax and given that proc-macros can only access their own token-trees.

@StarGazerM
Copy link

I think some ad-hoc solution but shouldn't been approved to ascent current implementation will be allow user to manually stratify a datalog program, and then you may be able to compose stratum, (in implementation will be something a macro take rules and add impl for compiled ascent structs) . But it still can't make you get something like define a TC and reuse in other datalog program.

@kmicinski
Copy link
Collaborator

Being able to get reuse from Datalog programs in general is a challenging open problem in language design. The issue is not superficial, and is rather deep. For example, you need to do whole-program complication to be able to calculate optimal indices, along with more deep optimizations that are typical of modern Datalogs.

I think this is good long-term motivation for a future version of Ascent, but it also plagues other Datalogs as well. There are some folks who are looking into module systems for Datalog reuse, for example Erdweg et al. have recently published some work on module systems for Datalog (https://dl.acm.org/doi/abs/10.1145/3689484.3690737). Their system, IncA, even has an Ascent backend.

@s-arash
Copy link
Owner

s-arash commented Nov 1, 2024

Given the limitations of Rust macros mentioned by @StarGazerM, and the fact that Datalog composability is challenging in general as mentioned by @kmicinski , the best thing that I think can be done is to have a include!() "macro" in Ascent, so code "modules" would be separate files containing Ascent relations/rules, that then can be include!()ed into actual Ascent programs.

It would look something like this:

// reachable.ascent(?)

relation reachable(Node, Node);

reachable(x, y) <-- edge(x, y);
reachable(x, z) <-- reachable(x, y), edge(y, z);
// main.rs

// ...
ascent! {
  include!("reachable.ascent");
  relation edge(Node, Node);
 // ...
}

@JoranHonig @regexident would something like this be useful and address your use cases?

@regexident
Copy link
Contributor

regexident commented Nov 1, 2024

@s-arash that would be a much welcome improvement for sure! 🙏🏻

This sort of thing sounds like a perfect use case for syn's fold() visitor.


A couple of observations:

No cyclic includes

Any includes in ascent would have to be acyclic.

Acyclic graph vs. tree of includes?

Or would either have to implemented some sort of include! de-duplication, or simply forbid redundant includes.

Update: We should probably use the DAG approach from the get go, see #44 (comment) for why.

One could start with a tree initially and look into expanding ascent to support a DAG of includes later, if tree-only includes are found to be impractical in real-world scenarios.

At most one struct Program per program

In order to avoid conflicts only one of the snippets (i.e. ascent files or inline macros) that form an ascent program, would be allowed to contain a struct Program …; then, I would assume?

In other words: an ascent snippet (be it from a file, or an inline macro) that contains a struct Program …; must not include ascent files that contain struct Program …; of their own (neither directly, nor transitively).

So when writing ascent files one would have to decide: an ascent file can either describe a self-contained ascent program and be runnable as is (which may require a struct Program …;), or be re-usable, in which case one would have to decide whether or not to include a struct Program …; in it, based on the individual use case.


I would be happy to help make this happen, too.

@regexident
Copy link
Contributor

To give some additional motivational context for such a composition feature:

I have a bunch of graph queries/filters/transformations implemented in ascent that all work on the same graph type and thus have to repeat the same EDB schema (i.e. relation node(…), relation edge(…), …) over and over again, which is both, annoying and a maintenance burden.

Full example code

Imagine a dozen of files like the following, each sharing the same "schema" head and common logic rules,
followed by additional program-specific logic further below, which might look something like this,
but would usually of course be significantly more complex:

ascent::ascent! {
    pub struct Program<Node, Edge>
    where
        Node: Clone + Eq + Hash,
        Edge: Clone + Eq + Hash;

    // Shared facts:

    relation node(
        Node // node
    );

    relation edge(
        Node, // source node
        Edge, // edge
        Node // target node
    );

    // Shared logic:

    relation reachable(
        Node, // source
        Node // target
    );

    reachable(x, y) <-- edge(x, _, y);
    reachable(x, z) <-- reachable(x, y), edge(y, _, z);

    relation neighbor(
        Node, // source
        Node // target
    );
   
    neighbor(x, y) <-- edge(x, _, y);

    // Program-specific logic:

    // ...
};

With an include!() feature in ascent as described by @s-arash above this could be cleaned up significantly:

The "shared" schema could be consolidated into a single file:

src/graph.ascent:

pub struct Program<Node, Edge>
where
    Node: Clone + Eq + Hash,
    Edge: Clone + Eq + Hash;

// Facts:

relation node(
    Node // node
);

relation edge(
    Node, // source node
    Edge, // edge
    Node // target node
);

And each "shared" logic relation would get its own .ascent file (in order to avoid program bloat), like so:

src/reachable.ascent:

relation reachable(
    Node, // source
    Node // target
);

reachable(x, y) <-- edge(x, _, y);
reachable(x, z) <-- reachable(x, y), edge(y, _, z);

src/neighbor.ascent:

relation neighbor(
    Node, // source
    Node // target
);

neighbor(x, y) <-- edge(x, _, y);

… which could then be included from each of the query/filter/transformation programs, like so:

// ...

ascent::ascent! {
    include!("graph.ascent");
    include!("reachable.ascent");
    include!("neighbor.ascent");

    // Program-specific logic:

    // ...
};

// ...

Or like this, if graph.ascent contained no struct Program…; of its own:

// ...

ascent::ascent! {
    pub struct Program<Node, Edge>
    where
        Node: Clone + Eq + Hash,
        Edge: Clone + Eq + Hash;

    include!("graph.ascent");
    include!("reachable.ascent");
    include!("neighbor.ascent");

    // Program-specific logic:

    // ...
};

// ...

@regexident
Copy link
Contributor

Now that I've actually played through such a scenario I think I would prefer to be able to form a DAG from such ascent snippets. 🤔

The motivation for supporting DAG-includes is this:

A file like this would have an implicit dependency on relations defined in graph.schema.ascent:

// reachable.ascent

relation reachable(
    Node, // source
    Node // target
);

reachable(x, y) <-- edge(x, _, y);
reachable(x, z) <-- reachable(x, y), edge(y, _, z);

As such from a maintenance point of view it would be desirable to avoid such implicit dependencies:

// reachable.ascent

include!("graph.ascent"); // 👈🏻

relation reachable(
    Node, // source
    Node // target
);

reachable(x, y) <-- edge(x, _, y);
reachable(x, z) <-- reachable(x, y), edge(y, _, z);

But since multiple files within a program's include graph would include graph.ascent one would need to de-duplicate includes, either based on the included file's path or content.

@StarGazerM
Copy link

Hi guys, one of my concerns is include will mess up rust analyzer and vscode code linter. I purpose for rules, maybe something more bi-direction, include rules work like add a impl function to the place where its included. Maybe I can explore myself and PR, I know it sounds vague.

@s-arash
Copy link
Owner

s-arash commented Nov 6, 2024

I did a bit of research. Unfortunately, there is currently no stable API for getting the file path of input tokens to macros, which is crucial to implementing include!. The API that exists is nightly-only (https://doc.rust-lang.org/stable/proc_macro/struct.SourceFile.html).

Related issues:
rust-lang/rust#54725
rust-lang/rfcs#3200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants