Pygmalion dependes on a number of packages, all of which can be installed using the following command
$ make req
Note that we have strict requirements for Python. That is, this is tested only on Python 3.6.5 It may work on later versions, but will not work on previous versions because we rely on constructs introduced in 3.6.
We have the following subjects under the ./subjects directory
- hello.py
- array.py
- microjson.py
- urljava.py
- mathexpr.py
- expr.py
- number.py
$ make xeval.hello
$ make xbnf.hello
We have the following stages
-
chain Generates the initial inputs using PyChains
-
trace Runs the generated inputs through a python frame tracer. This is the only python specific part. Unfortunately, because we mess with settrace, debugging is not available. Hence the next phase is separated out.
-
track Evaluate the dumped frames to assertain scopes and causality rules. We essentially retrieve the input stack information from the dumped frames.
-
mine Mine the parse tree from the stack frames. These are still input specific (hence the parse tree)
-
infer Generate the context free grammar by merging the parse trees. At this point, we nolonger can distinguish separate inputs.
-
refine Try to produce human readable grammar
-
fuzz Generate output from the infered and refined grammar.
-
eval Use the outputs generated and find how many are valid, and the amount of coverage obtained
-
bnf This is not a stage for grammar evaluation, but can be used to generate Human readable grammars from the refined grammar (depends on refine)
Each stage can be invoked by x.. For example, for complete evaluation of microjson.py, the following command would be used
$ make xeval.microjson
On the other hand, if only the human readable grammar is neceassary, the following command is used
$ make xbnf.microjson
The following command generates the initial inputs using PyChains for hello.py
$ make xchain.hello
The result is placed in .pickled/hello.py.chain and can be converted to readable ASCII by
$ ./bin/showpickle.py .pickled/hello.py.chain
A number of environment variables are used to control the Pygmalion
-
MY_RP Used to indicate how to proceed when an input is accepted. Some subjects such as urlpy.py and urljava.py allows single character inputs that should be extended to produce larger inputs. Default is 1.0. Choose MY_RP=0.1 for urljava.py for reasonable URLs
-
NINPUT The number of inputs that the Chain should produce before stopping. Default is 10.
-
R The random number seed. The default is 0.
-
NOCTRL Whether to produce characters such as \t\b\f\x012 which are not part of the list string.ascii_letters + string.digits + string.punctuation
-
NO_LOG (1) If set to
0
, we get more informative and verbose output (which slows down the program quite a bit). -
python3 The python interpreter used
-
pip3 The pip installer command
The configuration can be finetuned further by modifing these pygmalion.confg
variables and pychains.config
variables
-
config.Track_Params (True) Whether to track function parameters or not
-
config.Track_Vars (True) Whether to track local variables or not
-
config.Track_Return (False) Should we insert a special return variable from each function?
-
config.Ignore_Lambda (True) Strip out noise from lambda expressions
-
config.Swap_Eclipsing_keys (True) When we find a smaller key already contains a chunk (usually a peek) of a later variable, what should we do with the smaller variable? With enabled, we simply swap the order of these two variables in causality
-
config.Strip_Peek (True) Related to above -- If we detect a swap, rather than swap, simply discard the smaller (earlier) variable.
-
config.Prevent_Deep_Stack_Modification (False) Only replace things at a lower height with something at higher height. It is useful mainly for returned values that may be smaller than an earlier variable deeper in the call scope.
-
config.Wide_Trigger (10) Trigger wide search when this number of similar comparisons is done consecutively
-
config.Deep_Trigger (10) Trigger deep search when this number of unique states is reached for wide search.
make xeval.urljava MY_RP=0.1 NINPUT=100 NOUT=1000
make xeval.mathexpr MY_RP=0.1 NINPUT=100 NOUT=1000
make xeval.microjson NINPUT=100 NOUT=1000
make xeval.mathexpr MY_RP=0.1
make xeval.microjson NO_LOG=0
make xbnf.urljava NINPUT=100