Generated programs are missing spaces #15

gvanrossum · 2021-05-21T22:12:13Z

I must be missing something simple. I run this program:

from hypothesis import given
import hypothesmith as hs

@given(hs.from_grammar("eval_input"))
@settings(max_examples=50)
def test(expr):
    print(repr(expr))

test()

This prints examples like these:

'A'
'A\n'
'A,A,'
'A,A'
'A,'
'A,A,\n'
'A\n'
'AorA,A'
'AorA,'
'A,\n'
'lambda:A,\n'

I am pretty sure it meant A or A, not AorA. (I saw more similar example in other runs and variations of the program.)

It also occasionally prints a traceback and this error:

hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check fo
und 50 filtered examples but only 6 good ones. This will make your tests much slower, and also will probably dis
tort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by 
a low max_leaves parameter in recursive() calls

This is Python 3.9.2 on Windows.

hypothesis          6.13.0
hypothesmith        0.1.8

I figure I'm doing something wrong or not understanding something?

The text was updated successfully, but these errors were encountered:

Zac-HD · 2021-05-22T04:18:13Z

You're not missing anything! Hypothesmith is still a proof-of-concept, and while it's already in use (and finding bugs) for Black and LibCST there's plenty of low-hanging fruit to improve it. To quote the README:

This is definitely pre-alpha, but if you want to play with it feel free! You can even keep the shiny pieces when - not if - it breaks.

hs.from_grammar() is particularly limited - I've just taken an EBNF approximation of the Python 3.7 (iirc) grammar. hypothesis.extra.lark can 'invert' grammars by walking the tree of production rule and emitting the string that would be parsed; hs.from_grammar() adds a simple filter to reject any substring which should be compile()able but raises a SyntaxError. This approach was my proof of concept from the 2019 sprints, after talking to @ambv about fuzzing Black.

hs.from_node() is a much better approach, based on https://github.com/Instagram/LibCST/. It's still only at the "useful proof of concept" phase though, since I haven't wanted to invest more than a few days in it without knowing whether it would be used. It also inherits all the limits of LibCST, which e.g. doesn't yet support the relaxed decorator syntax for Python 3.9.

Both approaches only target syntactic validity - the code can be parsed and compiled, but is otherwise utter nonsense.

Producing semantically-valid code, where you can safely execute it with few or no uncaught exceptions, is "just" a matter of a few weeks of engineering time to express all the constraints (and swarm-testing optimisations etc). On the upside, that would allow CSmith-style differential testing of Python implementations or performance optimisations. Possibly a good topic from an internship?

gvanrossum · 2021-05-22T04:45:10Z

Zac, maybe you missed my point? I am merely suggesting that you are not emitting spaces around NAME tokens, so that when you generate "A or A" it comes out as "AorA" -- which parses as a single identifier.

I haven't actually looked at your code, but it shouldn't be too hard to fix this, should it? The simplest thing would be to just add a space after each token.

Anyway, from your description it appears that from_node() and from_grammar() use completely different mechanisms to generate code? I missed that in the docs. :-(

gvanrossum · 2021-05-22T04:51:25Z

An idea for following the true grammar: maybe we can augment the PEG parser generator we use for Python 3.9 and newer to spit out the grammar definition in a way that's consumable by hypothesmith (or maybe lark?).

Zac-HD · 2021-05-22T05:00:21Z

I am merely suggesting that you are not emitting spaces around NAME tokens, so that when you generate "A or A" it comes out as "AorA" -- which parses as a single identifier.

This is a known problem, but fixing it has not yet had the best cost/benefit tradeoff - I could solve it either by improving the grammar I'm using or by adding HypothesisWorks/hypothesis#2437. On the other hand spaces around NAME is only one way that this class of problem manifests, and hs.from_node() is a general solution.

An idea for following the true grammar: maybe we can augment the PEG parser generator we use for Python 3.9 and newer to spit out the grammar definition in a way that's consumable by hypothesmith

This would be excellent.

Ideally this would also allow hypothesmith to generate strings valid for 3.9 vs 3.10 vs 3.11 independent of the version that it's running under. Supporting experimental grammar changes would also be nice, I imagine 😁

gvanrossum · 2021-05-22T05:03:56Z

Okay, then you need to clearly document that from_grammar() is crap. :-)

gvanrossum · 2021-05-22T05:10:14Z

Also, generating strings from the grammar means that you can never detect bugs in the grammar itself (which is complex enough that it probably has bugs). And the PEG grammar we use doesn't drill down to the character level -- tokenization is still a separate pass, so turning a list of tokens into a string still requires a strategy for inserting whitespace that's smarter than "reject programs that don't parse" (since that would reject most programs if you squoosh all tokens together).

Zac-HD · 2021-05-23T12:31:22Z

It's already documented as pre-alpha! As a proof of concept, by my usual standards from_node() is crap too (it has some weird biases among node types, filters too harshly in places, lacks swarm testing, etc). This is all fixable, it's just that so far working on Hypothesis itself has been more valuable to the community.

turning a list of tokens into a string still requires a strategy for inserting whitespace that's smarter than "reject programs that don't parse" (since that would reject most programs if you squoosh all tokens together).

Yep, and the FailedHealthCheck you noticed is Hypothesis complaining that it is indeed rejecting most programs.

Let me try a couple of stupid-but-fast ideas for deterministically adding spaces... [edit] nope, only works in <3% of cases, at least with the stupid approach. And smarter approaches look a like building on a different foundation, either LibCST (which tracks required whitespace locally) or custom logic on top of the PEG grammar.

gvanrossum · 2021-05-23T15:26:04Z

I’d appreciate that. To me, there still is a huge difference between the “obvious” bug of not inserting spaces, and the subtler issues of bias.

pawamoy · 2022-01-30T18:28:04Z

Hi @gvanrossum, @Zac-HD

An idea for following the true grammar: maybe we can augment the PEG parser generator we use for Python 3.9 and newer to spit out the grammar definition in a way that's consumable by hypothesmith

Any news on this 🙂 ? I'm trying to test my AST visitor on generated Python code, but I'm hitting hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. with @given(hypothesmith.from_node(node=libcst.Module)).

@Zac-HD do you still have plans for this library? Could I be of any help?

Zac-HD · 2022-01-30T21:50:43Z

Since I've taken leave from my PhD to work at Anthropic, I'm not actively working on this at the moment - but I'd be delighted to provide design advice and code review if you're interested in contributing, and that would be really helpful!

The obvious place to start is in https://github.com/Zac-HD/hypothesmith/blob/master/src/hypothesmith/cst.py: libcst provides something like 150 node types, and we can make from_node() much more efficient (and the examples more diverse) by registering a strategy that knows how to construct valid instances of each. Check out the REGISTERED list for that - in most cases, you can just pass infer for each argument where the type is sufficient to specify a valid sub-node, and a more specific strategy like nonempty lists of $subnode where the type isn't enough. There are also some special cases like Try, where you can explicitly register them.

If you're interested, feel free to ask any other questions or just jump in with a tiny (one-node-type) first PR! As a general rule, we want to start by making "leaf" nodes more efficient; because an efficient "branch" or "trunk" node can lead to much more rejection sampling overall otherwise.

pawamoy · 2022-01-31T20:55:17Z

After reading your comment a second time and carefully looking at the REGISTERED list and how it's use in the loop below, it all makes sense 🙂

Since I already played with each type of node ast provides, I believe I'll be able to add strategies for other libcst nodes, and, as you suggested, start with just one in a first PR 😄

pawamoy · 2022-02-03T21:51:11Z

Pinging a recent maintainer of libcst: @zsol, do you have any advice on how to tackle this? To try and summarize, the goal is to create Hypothesis strategies that build libcst nodes. Maybe there are important things to know? Don't hesitate to ping someone else 😄

zsol · 2022-02-03T22:02:29Z

I've never personally written a hypothesis strategy so not sure I'll be of much help, but if you feel like something is missing in libcst itself that would help, I'm all for reviewing PRs to that end.

Generally CST nodes are supposed to be "easy" to construct by hand, and there's a method to validate syntactical correctness (each node having their own rules, naturally) when types can't fully express the syntactical structure. I expect that generating trees that adhere to the validation rules will help make the strategy more efficient (by e.g. not even considering nodes with imbalanced parenthesis around them)

pawamoy · 2022-02-03T22:12:26Z

Thanks for the quick reply. I see the _validate method is called in __post_init__. Is that always and automatically called when building a node? I also see the validate_types_shallow and validate_types_deep methods. I suppose we should ideally call the shallow one on each node while building the tree from leaves to the root? Any advice on how to quickly identify the "leaf nodes"? Will we ever need to build base nodes (like BaseElement or BaseExpression), or should they be avoided?

pawamoy · 2022-02-03T22:52:07Z

Nevermind, I've taken the full list of nodes and am ordering them by categories (binary ops, unary ops, boolean ops, etc.). This is quite straightforward, I should be able to answer these questions by myself 🙂

Zac-HD · 2022-02-03T23:18:05Z

🎉
base nodes are resolved as a union of their concrete subclasses, so it's fine to infer from them but you should avoid registering a strategy directly for them
there may in fact be trees representing invalid code, which we can still construct without validation errors (see e.g. Invalid code generated by removing parens from Call node Instagram/LibCST#531). At runtime these are discarded by the .filter(compilable) check, but it might be nice for LibCST if we ran some tests without this in order to report bugs upstream.
Remember, PR for a single node to start! 😉

pawamoy · 2022-02-04T17:16:17Z

Just to save it:

List of nodes by category (wip)

    # bases
    [libcst.BaseAssignTargetExpression,],
    [libcst.BaseAugOp,],
    [libcst.BaseBinaryOp,],
    [libcst.BaseBooleanOp,],
    [libcst.BaseComp,],
    [libcst.BaseCompOp,],
    [libcst.BaseCompoundStatement,],
    [libcst.BaseDelTargetExpression,],
    [libcst.BaseDict,],
    [libcst.BaseDictElement,],
    [libcst.BaseElement,],
    [libcst.BaseExpression,],
    [libcst.BaseFormattedStringContent,],
    [libcst.BaseList,],
    [libcst.BaseNumber,],
    [libcst.BaseParenthesizableWhitespace,],
    [libcst.BaseSet,],
    [libcst.BaseSimpleComp,],
    [libcst.BaseSlice,],
    [libcst.BaseSmallStatement,],
    [libcst.BaseStatement,],
    [libcst.BaseString,],
    [libcst.BaseSuite,],
    [libcst.BaseUnaryOp,],

    # binary ops
    [libcst.BinaryOperation,],
    [libcst.Add,],
    [libcst.AddAssign,],
    [libcst.Subtract,],
    [libcst.SubtractAssign,],
    [libcst.Multiply,],
    [libcst.MultiplyAssign,],
    [libcst.Divide,],
    [libcst.DivideAssign,],
    [libcst.FloorDivide,],
    [libcst.FloorDivideAssign,],
    [libcst.Power,],
    [libcst.PowerAssign,],
    [libcst.Modulo,],
    [libcst.ModuloAssign,],
    [libcst.MatrixMultiply,],
    [libcst.MatrixMultiplyAssign,],
    [libcst.BitAnd,],
    [libcst.BitAndAssign,],
    [libcst.BitOr,],
    [libcst.BitOrAssign,],
    [libcst.BitXor,],
    [libcst.BitXorAssign,],
    [libcst.LeftShift,],
    [libcst.LeftShiftAssign,],
    [libcst.RightShift,],
    [libcst.RightShiftAssign,],

    # unary ops
    [libcst.UnaryOperation,],
    [libcst.BitInvert,],
    [libcst.Minus,],
    [libcst.Plus,],

    # comparisons
    [libcst.Equal,],
    [libcst.NotEqual, st.just("!=")],
    [libcst.GreaterThan,],
    [libcst.GreaterThanEqual,],
    [libcst.LessThan,],
    [libcst.LessThanEqual,],

    # boolean ops
    [libcst.BooleanOperation,],
    [libcst.And,],
    [libcst.Or,],
    [libcst.Not,],

    # identity
    [libcst.Is,],
    [libcst.IsNot, infer, nonempty_whitespace, infer],

    # membership
    [libcst.In,],
    [libcst.NotIn, infer, nonempty_whitespace, infer],

    # built-in types
    [libcst.Float,],
    [libcst.Integer,],
    [libcst.Imaginary,],
    [libcst.SimpleString,],

    # strings
    [libcst.ConcatenatedString,],
    [libcst.FormattedString,],
    [libcst.FormattedStringExpression,],
    [libcst.FormattedStringText,],

    # built-in structures
    [libcst.Dict,],
    [libcst.List,],
    [libcst.Set, nonempty_seq(libcst.Element, libcst.StarredElement)],
    [libcst.Tuple,],

    # non-whitespace tokens
    [libcst.Comma,],
    [libcst.Colon,],
    [libcst.Dot,],
    [libcst.LeftCurlyBrace,],
    [libcst.LeftParen,],
    [libcst.LeftSquareBracket,],
    [libcst.RightCurlyBrace,],
    [libcst.RightParen,],
    [libcst.RightSquareBracket,],
    [libcst.ParenthesizedWhitespace,],
    [libcst.Semicolon,],

    # whitespace tokens
    [libcst.EmptyLine, infer, infer, infer],
    [libcst.Newline,],
    [libcst.SimpleWhitespace,],
    [libcst.TrailingWhitespace, infer, infer],

    # expressions
    [libcst.Call,],
    [libcst.Lambda,],
    [libcst.Expr,],
    [libcst.Slice,],

    # keywords
    [libcst.Del,],
    [libcst.Assert,],
    [libcst.Break,],
    [libcst.Continue,],
    [libcst.Ellipsis,],
    [libcst.Pass,],
    [libcst.Raise,],
    [libcst.Return,],
    [libcst.Yield,],

    # conditions
    [libcst.If,],
    [libcst.Else,],
    [libcst.IfExp,],

    # loops
    [libcst.For,],
    [libcst.While,],

    # match statements
    [libcst.Match,],
    [libcst.MatchAs,],
    [libcst.MatchCase,],
    [libcst.MatchClass,],
    [libcst.MatchKeywordElement,],
    [libcst.MatchList,],
    [libcst.MatchMapping,],
    [libcst.MatchMappingElement,],
    [libcst.MatchOr,],
    [libcst.MatchOrElement,],
    [libcst.MatchPattern,],
    [libcst.MatchSequence,],
    [libcst.MatchSequenceElement,],
    [libcst.MatchSingleton,],
    [libcst.MatchStar,],
    [libcst.MatchTuple,],
    [libcst.MatchValue,],

    # generators/comprehensions
    [libcst.GeneratorExp,],
    [libcst.CompFor,],
    [libcst.CompIf,],
    [libcst.DictComp,],
    [libcst.ListComp,],
    [libcst.SetComp,],

    # exceptions
    [libcst.Try,],
    [libcst.ExceptStarHandler,],
    [libcst.Finally,],

    # imports
    [libcst.From,],
    [libcst.ImportAlias,],
    [libcst.ImportStar,],

    # functions/parameters
    [libcst.FunctionDef,],
    [libcst.Param,],
    [libcst.Parameters,],
    [libcst.ParamSlash,],
    [libcst.ParamStar,],

    # elements (?)
    [libcst.Element,],
    [libcst.DictElement,],
    [libcst.StarredElement,],
    [libcst.SubscriptElement,],

    # comments
    [libcst.Comment,],

    # ??
    [libcst.Annotation,],
    [libcst.Arg,],
    [libcst.AssignEqual,],
    [libcst.AssignTarget,],
    [libcst.AugAssign,],
    [libcst.ClassDef,],
    [libcst.ComparisonTarget,],
    [libcst.Index,],
    [libcst.Module],
    [libcst.Name,],
    [libcst.NameItem,],
    [libcst.SimpleStatementLine,],
    [libcst.SimpleStatementSuite,],
    [libcst.StarredDictElement,],
    [libcst.TryStar,],
    [libcst.WithItem,],

I believe I can start by adding the built-in types nodes (like integer)! edit: Ah, no, integers and floats are already registered 😊
Well, lets go with a tuple then, it should be just the same as sets.

15r10nk mentioned this issue May 11, 2023

can pysource-codegen be used to generate code with hypothesmith #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated programs are missing spaces #15

Generated programs are missing spaces #15

gvanrossum commented May 21, 2021

Zac-HD commented May 22, 2021

gvanrossum commented May 22, 2021

gvanrossum commented May 22, 2021

Zac-HD commented May 22, 2021

gvanrossum commented May 22, 2021

gvanrossum commented May 22, 2021 via email

Zac-HD commented May 23, 2021 •

edited

Loading

gvanrossum commented May 23, 2021

pawamoy commented Jan 30, 2022

Zac-HD commented Jan 30, 2022

pawamoy commented Jan 31, 2022

pawamoy commented Feb 3, 2022 •

edited

Loading

zsol commented Feb 3, 2022

pawamoy commented Feb 3, 2022 •

edited

Loading

pawamoy commented Feb 3, 2022 •

edited

Loading

Zac-HD commented Feb 3, 2022

pawamoy commented Feb 4, 2022 •

edited

Loading

Generated programs are missing spaces #15

Generated programs are missing spaces #15

Comments

gvanrossum commented May 21, 2021

Zac-HD commented May 22, 2021

gvanrossum commented May 22, 2021

gvanrossum commented May 22, 2021

Zac-HD commented May 22, 2021

gvanrossum commented May 22, 2021

gvanrossum commented May 22, 2021 via email

Zac-HD commented May 23, 2021 • edited Loading

gvanrossum commented May 23, 2021

pawamoy commented Jan 30, 2022

Zac-HD commented Jan 30, 2022

pawamoy commented Jan 31, 2022

pawamoy commented Feb 3, 2022 • edited Loading

zsol commented Feb 3, 2022

pawamoy commented Feb 3, 2022 • edited Loading

pawamoy commented Feb 3, 2022 • edited Loading

Zac-HD commented Feb 3, 2022

pawamoy commented Feb 4, 2022 • edited Loading

Zac-HD commented May 23, 2021 •

edited

Loading

pawamoy commented Feb 3, 2022 •

edited

Loading

pawamoy commented Feb 3, 2022 •

edited

Loading

pawamoy commented Feb 3, 2022 •

edited

Loading

pawamoy commented Feb 4, 2022 •

edited

Loading