Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List available functions (SHOW FUNCTIONS) #12144

Open
findepi opened this issue Aug 24, 2024 · 8 comments · May be fixed by #12266
Open

List available functions (SHOW FUNCTIONS) #12144

findepi opened this issue Aug 24, 2024 · 8 comments · May be fixed by #12266
Labels
enhancement New feature or request

Comments

@findepi
Copy link
Member

findepi commented Aug 24, 2024

Is your feature request related to a problem or challenge?

I as a user would want to see a list of available functions.

Describe the solution you'd like

SHOW FUNCTIONS [ LIKE <pattern> ];

Describe alternatives you've considered

No response

Additional context

No response

@findepi findepi added the enhancement New feature or request label Aug 24, 2024
@findepi findepi linked a pull request Aug 30, 2024 that will close this issue
@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 1, 2024

I think we should support table function similar to https://duckdb.org/docs/sql/functions/overview.html

We could not only get the function name but also function_type, return_type, parameters ... and so on

select * from datafusion_functions()

@findepi
Copy link
Member Author

findepi commented Sep 1, 2024

We could not only get the function name but also function_type, return_type, parameters ... and so on

absolutely, more info than just function name is a good thing to have

I think we should support table function similar to https://duckdb.org/docs/sql/functions/overview.html

The SQL standard way of presenting function information seems to be via information_schema.routines view. This is similar to how tables' information is queryable. But then we still have SHOW TABLES on top of information_schema.tables for easier access.

We can certainly present this information as a table function (instead of, or additionally to).
What would be the main advantage of doing so?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 2, 2024

We can certainly present this information as a table function (instead of, or additionally to).
What would be the main advantage of doing so?

We could easily apply any query on the table function like

SELECT DISTINCT ON(function_name)
    function_name,
    function_type,
    return_type,
    parameters,
    parameter_types,
    description
FROM duckdb_functions()
WHERE function_type = 'scalar'
  AND function_name LIKE 'b%'
ORDER BY function_name;

and we can get a subset of information with select

Unlike the syntax here, it is completely different so we need additional syntax support. It is also not easy to select the subset of information

https://docs.snowflake.com/en/sql-reference/sql/show-functions

SHOW FUNCTIONS [ LIKE '<pattern>' ]
               [ IN
                    {
                      ACCOUNT                       |

                      CLASS <class_name>            |

                      DATABASE                      |
                      DATABASE <database_name>      |

                      SCHEMA                        |
                      SCHEMA <schema_name>          |
                      <schema_name>
                    }
               ]

@findepi
Copy link
Member Author

findepi commented Sep 3, 2024

We could easily apply any query on the table function like

This is a good point.
Would we get similar benefits if we implemented information_schema.routines view (which SQL spec happens to define)?

In any case, writing a full query to information_schema.* or to a table function, is a few key strokes. That's probably why we have SHOW TABLES statement even though the information is also available via information_schema.tabes and is more queryable there (one can select desired columns or apply filters).

Summing up we have discussed these implementation options

  1. information_schema.routines (as defined by the SQL spec)
  2. SHOW FUNCTIONS (similar to SHOW TABLES and similar (but not identical) to some query engines like Snowflake, Trino/Presto or MySQL)
  3. datafusion_functions() table function (similar to DuckDB's duckdb_functions())

Which one or which ones do we want to implement?
How do arrive at this decision to make sure it's not questioned later by some project maintainer(s) that I failed to involve in the discussion?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 3, 2024

I like the idea of datafusion_functions() as a table. I also prefer existing system (Postgres, DuckDB is my preference) than spec.

How do arrive at this decision to make sure it's not questioned later by some project maintainer(s) that I failed to involve in the discussion?

We could wait for a few days for more opionion. Don't worry about being questioned later on. Datafusion is a continuously evolving and breaking change friendly project, if there comes out a better idea we could review it again

@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

It is my opinion we should follow the same pattern as implemented for SHOW TABLES as it seems to work well and allows for easier integration into third-party tools (like BI tools) that may query such tables for information.

Specifically, I think that means:

  1. implement the information_schema schema (potentially with some additional custom columns for DataFusion like help_text), etc
  2. Implement SHOW FUNCTIONS as a query on top of the information schema
  3. (maybe this is wishful thinking) automatically create function documentation by scraping the code (as we do for configuration values)

@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

We could add a datafusion_functions table function in datafusion-cli or other downstream implementation perhaps

@alamb
Copy link
Contributor

alamb commented Sep 11, 2024

(maybe this is wishful thinking) automatically create function documentation by scraping the code (as we do for configuration values)

I filed #12432 to track this idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants