Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plain run functions for basic python script #432

Open
relaxin101 opened this issue Feb 6, 2025 · 5 comments
Open

Plain run functions for basic python script #432

relaxin101 opened this issue Feb 6, 2025 · 5 comments

Comments

@relaxin101
Copy link

relaxin101 commented Feb 6, 2025

Hello, I'm trying to write some scripts where I wanted to try using logica because the whole WITH AS syntax of plain sql got a bit heavy

While looking through the tutorials, I couldn't find a plain function to take a connection and a program and just run it in a python script, so I created my own:

def logica_psql(rules:str, target:str, conn: psycopg2.extensions.connection):
    """
        rules: The logica program
        target: The target predicate to be selected
        conn: A psycopg2 connection
        return: a cursor containing the results
    """
    program = universe.LogicaProgram(parse.ParseFile(rules, import_root=None)['rule'])

    sql = program.FormattedPredicateSql(target)

    return psql_logica.PostgresExecute(sql, conn)


def logica_sqlite3(rules: str, target: str, conn: sqlite3.Connection):
    """
        rules: The logica program
        target: The target predicate to be selected
        conn: A sqlit3 connection
        return: a cursor containing the results
    """
    program = universe.LogicaProgram(parse.ParseFile(rules, import_root=None)['rule'])

    sql = program.FormattedPredicateSql(target)
    cursor = conn.cursor()
    cursor.execute(sql)
    return cursor

Now I want to make sure that
a) I didn't miss some builtin functionality that handles this. If not, does it make sense to add these kinds of standard functions?
b) I didn't make some major mistake in the logic here, especially for the postgres function, since I got lost with how postgres is handled in the library

Thanks for any help :D

@EvgSkv
Copy link
Owner

EvgSkv commented Feb 6, 2025

Hello @relaxin101 ! 😁
Did you try RunPredicateFromString?
That one would deremine engine from the program, but also accepts connection.

Your functions look good. RunPredicateFromString catches errors and produces a nice message in case say syntax errors. But if you prefer to debug your code in a separate context these functions are perfect.

@relaxin101
Copy link
Author

relaxin101 commented Feb 6, 2025

Hello @relaxin101 ! 😁
Did you try RunPredicateFromString?
That one would deremine engine from the program, but also accepts connection.

Your functions look good. RunPredicateFromString catches errors and produces a nice message in case say syntax errors. But if you prefer to debug your code in a separate context these functions are perfect.

Thanks for the quick answer: I checked it out and I think I'd prefer a cursor instead of a dataframe but I knew I missed something... thanks for the pointer!
Two questions though:
-) This sqlite parsing (essentially inserting newlines, no?) seems kinda weird, shouldn't that be handled by the sql compiler and not by a run function? Will my code break by not handling that?
-) What purpose does the psql_execute function serve? It isn't used in the RunQueryPandas function as far as i could tell

Final note: Do you want/need help with some code cleanup? e.g. parser_py.parse.parseFile() taking the actual string as an input was quite confusing initially

@EvgSkv
Copy link
Owner

EvgSkv commented Feb 11, 2025

(sorry for delay, saw the question only today)

This sqlite parsing (essentially inserting newlines, no?) seems kinda weird, shouldn't that be handled by the sql compiler and not by a run function? Will my code break by not handling that?

Do you refer to this parsing? That one is indeed hack-y. The reason it's done is that most SQL dialects allow you to run multiple statements and return you result of the last one. However SQLite is the only (among supported ones) being so strict that it requires you to send statements one by one.
Logica compiler can break the result by statements, and it's used when running workflows in CoLab. But this function is for running a single long statement and it specialcases SQLite, as other engines can run the statement just fine.
On a philosophycal note: Readable intuitive code is important to me, but I have come to conclusioin that sharp edges are unavoidable. And in this case in particular I am not certain that there is a right way of handling the difference between engines, and especially that it's easy to figure out what the right way is. So I am in peace with this weirdness of SQLite handling :-)

And this made me realize: your SQLite running function has a limitation - it won't run for statements that write to DB, for the reason described above. Example: CoLab.

What purpose does the psql_execute function serve? It isn't used in the RunQueryPandas function as far as i could tell

Again I'm not sure to which function you refer :-) This one?
Looks like it creates connection, while RunQueryPandas does not.

Do you want/need help with some code cleanup? e.g. parser_py.parse.parseFile() taking the actual string as an input was quite confusing initially

Thanks for asking! Code cleanup is not a high priority at this point. Small comments here and there may help, but no refactoring will probably not be worth it at this point. I'd rather invest in taking Logica as a tool and applying it to domain specific problems that could make communities excited about the opportunities that it opens.

@relaxin101
Copy link
Author

relaxin101 commented Feb 13, 2025

No worries, I missed your answer as well after all 😅

ad SQLite parsing: Yes, that's what I'm referring to, thanks for the clarification!

ad PSQL execution: Yes, also the function I was referencing, but I was more confused by this segment. What purpose does this logical table/record/idk have? It seems postgres-specific and I can't wrap my head around what purpose it's supposed to serve

Thanks for asking! Code cleanup is not a high priority at this point. Small comments here and there may help, but no refactoring will probably not be worth it at this point. I'd rather invest in taking Logica as a tool and applying it to domain specific problems that could make communities excited about the opportunities that it opens.

Ok, thank you for clarifying. The reason I'm asking is that I'm considering to incorporate logica into my Bachelor's thesis and was wondering how stable the project structure would be since I think I'll kinda have to hook into some internals. But if changes are going to remain minor for the time being, that should be fine. In case vector similarity search is something you want to incorporate in the near future, please let me know, that'd be particularly interesting to me :)

@EvgSkv
Copy link
Owner

EvgSkv commented Feb 19, 2025

Yes, also the function I was referencing, but I was more confused by this segment. What purpose does this logical table/record/idk have? It seems postgres-specific and I can't wrap my head around what purpose it's supposed to serve

Alright, let's quote that piece of code here:

    for t in types:
      if t != 'logicarecord893574736':  # Empty record.
        psycopg2.extras.register_composite(t, cursor, globally=True)

We are registring the types that are used in the program in the pyscopg2. This is done so that when the data is retrieved into a dataframe, then composite types in the output will be python dictionaries. Otherwise as far as I recall they are not easy to read from Python.

The reason I'm asking is that I'm considering to incorporate logica into my Bachelor's thesis and was wondering how stable the project structure would be since I think I'll kinda have to hook into some internals. But if changes are going to remain minor for the time being, that should be fine. In case vector similarity search is something you want to incorporate in the near future, please let me know, that'd be particularly interesting to me :)

Yeah, I think it's safe to say that no major changes are coming in near months. It could also be practical for the purpses of Bachelor's thesis to work with a snapshot of the Logica code.

Is your bachelor's about the vector similarity search? Is it the RAG of LLMs area, or something else? I was thinking in this direction, but didn't have any specific plans yet. In any case, I'd be very curious to read your Bachelors thesis!

Let me know if you have further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants