Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Text Format] Text Printer Refactor and Debug Printing #2605

Merged
merged 74 commits into from
Mar 20, 2019

Conversation

joshpoll
Copy link
Contributor

@joshpoll joshpoll commented Feb 15, 2019

  • Replaces existing text printer with one based on Section 1 of "A prettier printer" by Philip Wadler; however, the printer can be improved by adopting a version later in the paper.
  • Adds stream-like API for the DocAtom ADT.
  • Adds debug printing "pass" that includes more traditional pretty printing style.
  • Adds FunctionType support.

How the printer works

The pretty printer design works roughly as follows. Instead of creating a string directly, you instead create a Doc that consists of a vector of DocAtoms. The purpose of Doc is to separate the text's meaning (e.g. specifying where new lines and indentations should go) from their actual layout. For example, depending on the length of a line, you may want to print

f(a, b, c, d)

or

f(a,
  b,
  c,
  d)

This is easy to do with an abstract doc model. We can encode choices in our Doc specification and then let an algorithm choose the optimal layout later.

The current DocAtoms are Text and Line. Text stores a piece of text with no line breaks and Line stores a line break and the amount of indentation directly after it. Unless you are working with the API, you should not need to think about DocAtoms except that new lines should be added to a Doc separately from text or the line and text will be treated as a single Text atom instead of a Line and a Text atom.

The main way to build a Doc is to via the stream-like interface exposed with the following Doc constructor and member functions:

explicit Doc(const std::string& str)
Doc& operator<<(const Doc& right)
Doc& operator<<(const std::string& right)
template<typename T>
Doc& operator<<(const T& right)
std::string str()

The remaining functions you need to think about are:

  • friend Doc Indent(int indent, const Doc& doc): This function returns a new Doc that is like doc except all the Lines inside it have had their indentation extended by indent.
  • Doc PrintVec(const std::vector<Doc>& vec, const Doc& sep = Doc(", ")): This intersperses the sep between every doc in vec.
  • various specific Print functions

@merrymercy
Copy link
Member

Can this finally support attributes of operators (e.g. stride, padding) whose types are tvm::Expr, tvm::Array or string? They are not relay expression.

@joshpoll
Copy link
Contributor Author

@merrymercy This PR is intentionally light on new printer functionality and focuses primarily on infrastructure. Once we get the scaffolding down it will be easier to work on the parser and printer. I'm a bit short on time right now so I can't promise that I will be able to add attribute printing, but my hope is that the Doc ADT is easy enough to use that people can extend the printer relatively easily.

@merrymercy
Copy link
Member

Thanks. Currently, I cannot use the text format because I cannot write attributes in the text format. Although I can hack it to support some constants.

@tqchen
Copy link
Member

tqchen commented Feb 23, 2019

One note: move include/relay/doc.h into src/relay/ir, just to keep it internal and minimum.

@@ -46,8 +46,7 @@ class PrettyPrinter :
Doc PrintFinal(const NodeRef& node) {
// TODO(@jmp): If these lines are combined it segfaults??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the todo.
doc_stack_.back() is a reference which get invalidate as Print might push_back into doc_stack_
using a linked list might also fix the problem

@joshpoll joshpoll changed the title [Relay][Text Format] Parser-Printer Roundtrip Scaffolding [Relay][Text Format] Text Printer Refactor Mar 4, 2019
@joshpoll joshpoll changed the title [Relay][Text Format] Text Printer Refactor [Relay][Text Format] Text Printer Refactor and ANF Printing Mar 7, 2019
@joshpoll joshpoll marked this pull request as ready for review March 7, 2019 23:24
@joshpoll
Copy link
Contributor Author

joshpoll commented Mar 7, 2019

This PR is close to feature complete. cc @wweic @merrymercy @MarisaKirisame @tqchen @jroesch.

@tqchen
Copy link
Member

tqchen commented Mar 8, 2019

  • Is it possible to simply write test-cases for now using the test infra without introducing hypothesis dependency in the CI. You can still use some of that to test locally and find failure case to add to the CI.
  • I have seen ;`` as separators, can we remove the reliance of ;```? to make the syntax looks a bit more like python?
  • The use %var[index] vs of %var.index seems was not discussed, and perhaps need some discussion
  • The gnf flag of printer should be removed, as the printer should support print GNF, ANF mixed code automatically.
    • We want one canonical form of the text format, and we should always print that by default, adding additional flags creates un-necessary metal burdens to the user and can also blow up the interface

python/tvm/relay/base.py Show resolved Hide resolved
src/relay/ir/doc.cc Outdated Show resolved Hide resolved
src/relay/ir/doc.cc Show resolved Hide resolved
}

// indent a new body
// TODO(jmp): indent should be an instance variable of the printer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix this? It is little work.

* \param prefix The prefix of the name
* \return The returned name.
*/
Doc GetUniqueName(std::string prefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &


if __name__ == "__main__":
# for _ in range(10):
# print(anf_print(exprs.example()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or remove

one = relay.const(1)
assert gnf_print(relay.Tuple([one, one])) == "v0.0.1\n(1, 1)\n"

# assert gnf_print(relay.If(relay.const(True), relay.TupleGetItem(relay.Tuple([one, one]), 0), relay.TupleGetItem(relay.Tuple([one, one, relay.const(1)]), 0))) == "v0.0.1\n%0 = True\nif (%0) {\n %1 = 1\n %2 = (%1, %1)\n %2.0\n} else {\n %1 = 1\n %2 = 1\n %3 = (%1, %1, %2)\n %3.0\n}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or remove

@joshpoll
Copy link
Contributor Author

joshpoll commented Mar 9, 2019

@tqchen

Is it possible to simply write test-cases for now using the test infra without introducing hypothesis dependency in the CI. You can still use some of that to test locally and find failure case to add to the CI.

I will remove hypothesis for now, but it should be added back eventually.

I have seen ;`` as separators, can we remove the reliance of ;```? to make the syntax looks a bit more like python?

This is out of the scope of this PR. I think it should go in an RFC.

The use %var[index] vs of %var.index seems was not discussed, and perhaps need some discussion

This is fair. There should be a new RFC soon for features that have not been implemented in the parser yet, including tuple projection. I was maintaining the convention in the commented out parts of the parser and in the old text printer.

The gnf flag of printer should be removed, as the printer should support print GNF, ANF mixed code automatically.
We want one canonical form of the text format, and we should always print that by default, adding additional flags creates un-necessary metal burdens to the user and can also blow up the interface

With the gnf flag enabled (this is done by default in both Python and in C++), the new printer behaves as the old text printer did. The opposite of gnf printing isn't anf, since let nodes are printable with gnf enabled. Rather turning gnf off prevents the creation of temporary variables, which is the style used for anf components of gnf programs. I've received feedback from some developers that this is desirable, and it's useful to bridge the gap between ML and PL. Those who aren't interested in this format can ignore it entirely.

Also I disagree that additional flags are harmful. As long as they're optional they do not create unnecessary mental burden. Moreover, it is common for printers to have many flags:

@joshpoll joshpoll changed the title [Relay][Text Format] Text Printer Refactor and ANF Printing [Relay][Text Format] Text Printer Refactor and "PL" Printing Mar 9, 2019
@tqchen
Copy link
Member

tqchen commented Mar 9, 2019

It might be fine to support a big amount of options in a developer private API. But it is not a good idea to do so in a user-facing API.

I was a big fan of a list of thousands of arguments(hey who do not want new features)? However, the problem of having such new features means two things:

  • Users have to understand about these new options (e.g. what does gnf stands for?, imagine what will user think when reading the document of astext and find about this additional option).
  • They bring burden to maintain these features (technical debts)
  • Something can go wrong

In particular, in the case of gnf=False, the semantics of the program is no longer equivalent to the original one when there are shared path (such as ResNet). This can be very problematic, because astext is supposed to create a bidirectional format (in the case of gnf=False this is no longer true)

Note that if the program itself is ANF already, printing with the normal mode should work out of the box, so we also do not need such option as well.

So to make things simpler for the public facing API, I think we should simply disable this to remove ambiguity. This is mainly because we have the graph node syntax in the program(unlike typical FP languages). This way, the user has one less concept to learn, and we have one precise syntax for the program.

If we really want to support some form of such kind of folding in the long term(which I am not sure if it is really necessary). We need to run analysis to detect the nodes that are only referenced once and optionally use an algorithm to decide how to best fold intermediate nodes into an expression. At the same time, the parser has to be updated to support the folded expression. And if we want to switch such printing option on, the flag could be in some global configuration env(something like relay.set_printer_option), which is only used by advanced developers, so users do not have to worry about the additional option.

@tqchen
Copy link
Member

tqchen commented Mar 9, 2019

Back to the choices of the tuple projection and separator. I think we should settle down the discussion with an RFC before committing the code. It is even more dangerous to have a dynamic text format flying around and claims that we have parser round-tripping (imagine user starts to use the feature then find things break afterwards).

A safe choice would be to keep a consistent behavior with the old text printer, (no separator, [] projection), and give up round-tripping in the PR, until we fully resolved and committed to a text format.

@joshpoll joshpoll changed the title [Relay][Text Format] Text Printer Refactor and "PL" Printing [Relay][Text Format] Text Printer Refactor and Debug Printing Mar 14, 2019
@joshpoll
Copy link
Contributor Author

After discussions with @tqchen I have

  • removed roundtrip tests
  • scaled back over-aggressive inlining
  • reverted syntax changes
  • separated printing into two public-facing functions: one for the bidirectional text format and one for debugging

The PR is now feature complete. Inlining is currently less aggressive than the existing text printer; however, it is more correct. Here is the roadmap going forward before 0.6:

  1. Introduce smart inlining.
  2. Open an RFC for remaining text format design decisions including semicolons, generics, and tuple projection.
  3. Establish roundtrip tests.
  4. Optimize the printer using arenas.
  5. Use more advanced Doc models from Wadler's paper (may happen earlier depending on need/interest).

cc @wweic @merrymercy @MarisaKirisame @tqchen @jroesch

src/relay/ir/pretty_printer.cc Outdated Show resolved Hide resolved
python/tvm/relay/_parser.py Show resolved Hide resolved
// DSL function implementations

Doc& Doc::operator<<(const Doc& right) {
this->stream_.insert(this->stream_.end(), right.stream_.begin(), right.stream_.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is correct.
Use https://en.cppreference.com/w/cpp/iterator/back_insert_iterator instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use a for loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check against this == &right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still has to catch against this == &right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see latest commit

src/relay/ir/pretty_printer.cc Outdated Show resolved Hide resolved
@tqchen tqchen merged commit db5bfa3 into apache:master Mar 20, 2019
@tqchen
Copy link
Member

tqchen commented Mar 20, 2019

Thanks, @wweic @MarisaKirisame @joshpoll this is now merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants