Skip to content

Commit

Permalink
Add lexer to handle query parsing for hints
Browse files Browse the repository at this point in the history
This commit adds a lexer to pg_hint_plan to properly parse the hints
from query strings.  This implementation is based on PostgreSQL's
psqlscan.l, which is largely simplified and made pluggable for the sake
of this module as it mostly cares about detecting comments and hints in
queries.

This has the advantage of removing all the custom code that pg_hint_plan
has invented to parse the query strings, cleaning up some confusing
historical behaviors in the module:
- A qual value could be considered as a valid hint, like:
SELECT * FROM tab WHERE col = '/* IndexScan (tab) */';
This is now parsed as a value, as it should.
- The parsing of the comments was sometimes incorrect, failing to ignore
comments that would make hints invalid.  For example:
SELECT /* IndexScan (tab /* internal_comment */ ) */ FROM tab;
Any comments internal to the hints are now automatically discarded.

Attempting to use nested comments generates INFO messages each time
something fishy is found.  The contents of the hints behave mostly the
same way as before this commit, with their custom parsing.  An exception
is with nested comments, which are now simply ignored in the middle of
hints, hints actually used if these are valid.  This makes the
manipulation of the StringInfo saving the hint strings more
straight-forward.  A couple of regression tests still need to be
adjusted a bit, this is left as work for later.

It could make sense to use a second layer with yyac to build the binary
tree representation of the hints from their strings, but this is left as
future work.  A couple of regression tests need to have their output
adjusted as now doublons of recursive comments can be fully detected at
their correct position because yyac is much better at this job than the
previous custom parser.

I have benchmarked this change with what would be a kind of worse-case
scenario, comparing HEAD and this new query parser:
$ pgbench -f select.sql -n -c 1 -T TIME_IN_SECS
$ cat select.sql:
SELECT /*+SeqScan(a)*/ 1;
PostgreSQL had pg_stat_statements and pg_hint_plan loaded in
shared_preload_libraries, with syslogger doing a minimal amount of
activity:
shared_preload_libraries = 'pg_stat_statements,pg_hint_plan'
fsync = off
log_statement = 'none'
log_min_messages = fatal

CFLAGS used -O2, without --enable-cassert, and I did not see
a difference between the patch and HEAD in terms of yyac, for both TPS
and in terms of perf profiles.

The memory footprint is stable across the same backend, with hints from
comments still copied to TopMemoryContext (no change here).  A
difference is that yyac does *not* run in TopMemoryContext, only the
hint string is pstrdup()'d in it, *if any*.

No backpatch of this feature is planned in v16 and older stable
branches, so this will be a v17~ only thing.

Per pull request #138.
  • Loading branch information
michaelpq committed Oct 20, 2023
1 parent b1f2aab commit bfb4544
Show file tree
Hide file tree
Showing 11 changed files with 1,438 additions and 96 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ regression.*
/tmp_check/
/RPMS/

# Generated files
/query_scan.c

# Documentation artifacts
docs/_build
docs/locale/**/*.mo
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@
# Copyright (c) 2012-2023, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
#

MODULES = pg_hint_plan
MODULE_big = pg_hint_plan
OBJS = \
$(WIN32RES) \
pg_hint_plan.o \
query_scan.o

HINTPLANVER = 1.7.0

REGRESS = init base_plan pg_hint_plan ut-init ut-A ut-S ut-J ut-L ut-G ut-R \
Expand Down
29 changes: 18 additions & 11 deletions expected/pg_hint_plan.out
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.val = t2.val;

/*+ Test (t1 t2) */
EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.id = t2.id;
INFO: pg_hint_plan: hint syntax error at or near "Test (t1 t2) "
INFO: pg_hint_plan: hint syntax error at or near " Test (t1 t2) "
DETAIL: Unrecognized hint keyword "Test".
QUERY PLAN
--------------------------------------
Expand Down Expand Up @@ -153,16 +153,23 @@ EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.id = t2.id;

/*+Set(enable_indexscan off) /* nest comment */ */
EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.id = t2.id;
INFO: pg_hint_plan: hint syntax error at or near "/* nest comment */ */
EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.id = t2.id;"
INFO: pg_hint_plan: hint syntax error at or near "/*"
DETAIL: Nested block comments are not supported.
QUERY PLAN
--------------------------------------
Merge Join
Merge Cond: (t1.id = t2.id)
-> Index Scan using t1_pkey on t1
-> Index Scan using t2_pkey on t2
(4 rows)
LOG: pg_hint_plan:
used hint:
Set(enable_indexscan off)
not used hint:
duplication hint:
error hint:

QUERY PLAN
------------------------------
Hash Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
(5 rows)

/*+Set(enable_indexscan off)*/
EXPLAIN (COSTS false) SELECT * FROM t1, t2 WHERE t1.id = t2.id;
Expand Down Expand Up @@ -9048,7 +9055,7 @@ Rows(t1 t2 /99)
\o results/pg_hint_plan.tmpout
/*+ Rows(t1 t2 -99999) */
EXPLAIN SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id);
WARNING: Force estimate to be at least one row, to avoid possible divide-by-zero when interpolating costs : Rows(t1 t2 -99999)
WARNING: Force estimate to be at least one row, to avoid possible divide-by-zero when interpolating costs : Rows(t1 t2 -99999)
LOG: pg_hint_plan:
used hint:
Rows(t1 t2 -99999)
Expand Down
92 changes: 70 additions & 22 deletions expected/ut-A.out
Original file line number Diff line number Diff line change
Expand Up @@ -115,18 +115,32 @@ error hint:

-- No. A-5-2-3
EXPLAIN (COSTS false) SELECT c1 AS "c1"/*+SeqScan(t1)*/ FROM s1.t1 WHERE t1.c1 = 1;
QUERY PLAN
-----------------------------------
Index Only Scan using t1_i1 on t1
Index Cond: (c1 = 1)
LOG: pg_hint_plan:
used hint:
SeqScan(t1)
not used hint:
duplication hint:
error hint:

QUERY PLAN
--------------------
Seq Scan on t1
Filter: (c1 = 1)
(2 rows)

-- No. A-5-2-4
EXPLAIN (COSTS false) SELECT * /*+SeqScan(t1)*/ FROM s1.t1 WHERE t1.c1 = 1;
QUERY PLAN
------------------------------
Index Scan using t1_i1 on t1
Index Cond: (c1 = 1)
LOG: pg_hint_plan:
used hint:
SeqScan(t1)
not used hint:
duplication hint:
error hint:

QUERY PLAN
--------------------
Seq Scan on t1
Filter: (c1 = 1)
(2 rows)

----
Expand Down Expand Up @@ -1399,9 +1413,17 @@ error hint:
-- No. A-9-2-11
/*+SeqScan(/**/)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "/**/" WHERE "/**/".c1 = 1;
INFO: pg_hint_plan: hint syntax error at or near "/**/)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "/**/" WHERE "/**/".c1 = 1;"
INFO: pg_hint_plan: hint syntax error at or near "/**/"
DETAIL: Nested block comments are not supported.
INFO: pg_hint_plan: hint syntax error at or near ""
DETAIL: SeqScan hint requires a relation.
LOG: pg_hint_plan:
used hint:
not used hint:
duplication hint:
error hint:
SeqScan()

QUERY PLAN
-------------------------------------
Index Scan using t1_i1 on t1 "/**/"
Expand All @@ -1410,9 +1432,21 @@ DETAIL: Nested block comments are not supported.

/*+SeqScan(/**//**//**/)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "/**//**//**/" WHERE "/**//**//**/".c1 = 1;
INFO: pg_hint_plan: hint syntax error at or near "/**//**//**/)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "/**//**//**/" WHERE "/**//**//**/".c1 = 1;"
INFO: pg_hint_plan: hint syntax error at or near "/**//**//**/"
DETAIL: Nested block comments are not supported.
INFO: pg_hint_plan: hint syntax error at or near "/**//**/"
DETAIL: Nested block comments are not supported.
INFO: pg_hint_plan: hint syntax error at or near "/**/"
DETAIL: Nested block comments are not supported.
INFO: pg_hint_plan: hint syntax error at or near ""
DETAIL: SeqScan hint requires a relation.
LOG: pg_hint_plan:
used hint:
not used hint:
duplication hint:
error hint:
SeqScan()

QUERY PLAN
---------------------------------------------
Index Scan using t1_i1 on t1 "/**//**//**/"
Expand All @@ -1426,11 +1460,16 @@ Set/**/あ")*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "tT()""
Set/**/あ" WHERE "tT()""
Set/**/あ".c1 = 1;
INFO: pg_hint_plan: hint syntax error at or near "/**/あ")*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 "tT()""
Set/**/あ" WHERE "tT()""
Set/**/あ".c1 = 1;"
INFO: pg_hint_plan: hint syntax error at or near "/**/"
DETAIL: Nested block comments are not supported.
LOG: pg_hint_plan:
used hint:
not used hint:
SeqScan("tT()""
Setあ")
duplication hint:
error hint:

QUERY PLAN
------------------------------------------
Index Scan using t1_i1 on t1 "tT()""
Expand Down Expand Up @@ -1602,13 +1641,22 @@ error hint:
-- No. A-7-4-7
/*+Set(enable_indexscan off)Set(enable_tidscan /* value */off)Set(enable_bitmapscan off)SeqScan(t1)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 WHERE t1.c1 = 1;
INFO: pg_hint_plan: hint syntax error at or near "/* value */off)Set(enable_bitmapscan off)SeqScan(t1)*/
EXPLAIN (COSTS false) SELECT * FROM s1.t1 WHERE t1.c1 = 1;"
INFO: pg_hint_plan: hint syntax error at or near "/*"
DETAIL: Nested block comments are not supported.
QUERY PLAN
------------------------------
Index Scan using t1_i1 on t1
Index Cond: (c1 = 1)
LOG: pg_hint_plan:
used hint:
SeqScan(t1)
Set(enable_bitmapscan off)
Set(enable_indexscan off)
Set(enable_tidscan off)
not used hint:
duplication hint:
error hint:

QUERY PLAN
--------------------
Seq Scan on t1
Filter: (c1 = 1)
(2 rows)

----
Expand Down
2 changes: 1 addition & 1 deletion expected/ut-J.out
Original file line number Diff line number Diff line change
Expand Up @@ -3771,7 +3771,7 @@ INFO: pg_hint_plan: hint syntax error at or near "HashJoin(*VALUES* t3 t2)Merge
DETAIL: Relation name "*VALUES*" is ambiguous.
INFO: pg_hint_plan: hint syntax error at or near "MergeJoin(*VALUES* t3 t2 t1)"
DETAIL: Relation name "*VALUES*" is ambiguous.
INFO: pg_hint_plan: hint syntax error at or near "Leading(*VALUES* t3 t2 t1) NestLoop(t4 t3)HashJoin(*VALUES* t3 t2)MergeJoin(*VALUES* t3 t2 t1)"
INFO: pg_hint_plan: hint syntax error at or near " Leading(*VALUES* t3 t2 t1) NestLoop(t4 t3)HashJoin(*VALUES* t3 t2)MergeJoin(*VALUES* t3 t2 t1)"
DETAIL: Relation name "*VALUES*" is ambiguous.
LOG: pg_hint_plan:
used hint:
Expand Down
2 changes: 1 addition & 1 deletion expected/ut-L.out
Original file line number Diff line number Diff line change
Expand Up @@ -3591,7 +3591,7 @@ error hint:

/*+ Leading(*VALUES* t3 t2 t1) */
EXPLAIN (COSTS false) SELECT * FROM s1.t1, s1.t2, (VALUES(1,1,1,'1'), (2,2,2,'2')) AS t3 (c1, c2, c3, c4), (VALUES(1,1,1,'1'), (2,2,2,'2')) AS t4 (c1, c2, c3, c4) WHERE t1.c1 = t2.c1 AND t1.c1 = t3.c1 AND t1.c1 = t4.c1;
INFO: pg_hint_plan: hint syntax error at or near "Leading(*VALUES* t3 t2 t1) "
INFO: pg_hint_plan: hint syntax error at or near " Leading(*VALUES* t3 t2 t1) "
DETAIL: Relation name "*VALUES*" is ambiguous.
LOG: pg_hint_plan:
used hint:
Expand Down
2 changes: 1 addition & 1 deletion expected/ut-R.out
Original file line number Diff line number Diff line change
Expand Up @@ -4237,7 +4237,7 @@ INFO: pg_hint_plan: hint syntax error at or near "Rows(*VALUES* t3 t2 #2)Rows(*
DETAIL: Relation name "*VALUES*" is ambiguous.
INFO: pg_hint_plan: hint syntax error at or near "Rows(*VALUES* t3 t2 t1 #2)"
DETAIL: Relation name "*VALUES*" is ambiguous.
INFO: pg_hint_plan: hint syntax error at or near "Leading(*VALUES* t3 t2 t1) Rows(t4 t3 #2)Rows(*VALUES* t3 t2 #2)Rows(*VALUES* t3 t2 t1 #2)"
INFO: pg_hint_plan: hint syntax error at or near " Leading(*VALUES* t3 t2 t1) Rows(t4 t3 #2)Rows(*VALUES* t3 t2 #2)Rows(*VALUES* t3 t2 t1 #2)"
DETAIL: Relation name "*VALUES*" is ambiguous.
LOG: pg_hint_plan:
used hint:
Expand Down
83 changes: 24 additions & 59 deletions pg_hint_plan.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@
/* partially copied from pg_stat_statements */
#include "normalize_query.h"

/* Owner scanner */
#include "query_scan.h"

/* PostgreSQL */
#include "access/htup_details.h"

Expand Down Expand Up @@ -1945,76 +1948,40 @@ get_hints_from_table(const char *client_query, const char *client_application)

/*
* Get hints from the head block comment in client-supplied query string.
*
* The extracted hint is palloc()'d in TopMemoryContext.
*/
static const char *
get_hints_from_comment(const char *p)
{
const char *hint_head;
char *head;
char *tail;
int len;
QueryScanState sstate;
StringInfo query_buf;
char *result = NULL;

if (p == NULL)
return NULL;

/* extract query head comment. */
hint_head = strstr(p, HINT_START);
if (hint_head == NULL)
return NULL;

for (;p < hint_head; p++)
{
/*
* Allow these characters precedes hint comment:
* - digits
* - alphabets which are in ASCII range
* - space, tabs and new-lines
* - underscores, for identifier
* - commas, for SELECT clause, EXPLAIN and PREPARE
* - parentheses, for EXPLAIN and PREPARE
* - squared brackets, for arrays (like arguments of PREPARE
* statements).
*
* Note that we don't use isalpha() nor isalnum() in ctype.h here to
* avoid behavior which depends on locale setting.
*/
if (!(*p >= '0' && *p <= '9') &&
!(*p >= 'A' && *p <= 'Z') &&
!(*p >= 'a' && *p <= 'z') &&
!isspace(*p) &&
*p != '_' &&
*p != ',' &&
*p != '(' && *p != ')' &&
*p != '[' && *p != ']')
return NULL;
}

head = (char *)hint_head;
p = head + strlen(HINT_START);
skip_space(p);
sstate = query_scan_create();
query_buf = makeStringInfo();

/* find hint end keyword. */
if ((tail = strstr(p, HINT_END)) == NULL)
query_scan_setup(sstate, p, strlen(p), 0,
standard_conforming_strings,
pg_hint_plan_parse_message_level);
for (;;)
{
hint_ereport(head, ("Unterminated block comment."));
return NULL;
}

/* We don't support nested block comments. */
if ((head = strstr(p, BLOCK_COMMENT_START)) != NULL && head < tail)
{
hint_ereport(head, ("Nested block comments are not supported."));
return NULL;
QueryScanResult sr = query_scan(sstate, query_buf);
if (sr == QUERY_SCAN_EOL)
break;
}

/* Make a copy of hint. */
len = tail - p;
head = palloc(len + 1);
memcpy(head, p, len);
head[len] = '\0';
p = head;
query_scan_finish(sstate);

return p;
/* Get a copy of the hint extracted, if any */
if (query_buf->len > 0)
result = MemoryContextStrdup(TopMemoryContext, query_buf->data);
pfree(query_buf->data);
pfree(query_buf);
return result;
}

/*
Expand Down Expand Up @@ -2992,9 +2959,7 @@ get_current_hint_string(Query *query, const char *query_str,
}

/* get hints from the comment */
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
current_hint_str = get_hints_from_comment(query_str);
MemoryContextSwitchTo(oldcontext);

if (debug_level > 1)
{
Expand Down
40 changes: 40 additions & 0 deletions query_scan.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*-------------------------------------------------------------------------
*
* query_scan.h
* lexical scanner for SQL commands
*
* This lexer can be used to extra hints from query contents, taking into
* account what the backend would consider as values, for example.
*
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* query_scan.h
*
*-------------------------------------------------------------------------
*/
#ifndef QUERY_SCAN_H
#define QUERY_SCAN_H

#include "lib/stringinfo.h"

/* Abstract type for lexer's internal state */
typedef struct QueryScanStateData *QueryScanState;

/* Termination states for query_scan() */
typedef enum
{
QUERY_SCAN_INCOMPLETE, /* end of line, SQL statement incomplete */
QUERY_SCAN_EOL /* end of line, SQL possibly complete */
} QueryScanResult;

extern QueryScanState query_scan_create(void);
extern void query_scan_setup(QueryScanState state,
const char *line, int line_len,
int encoding, bool std_strings,
int elevel);
extern void query_scan_finish(QueryScanState state);
extern QueryScanResult query_scan(QueryScanState state,
StringInfo query_buf);

#endif /* QUERY_SCAN_H */
Loading

0 comments on commit bfb4544

Please sign in to comment.