diff --git a/docs/declarative_translation.rst b/docs/declarative_translation.rst new file mode 100644 index 00000000..82189727 --- /dev/null +++ b/docs/declarative_translation.rst @@ -0,0 +1,94 @@ +.. include:: links.rst + +.. _mini-tutorial: mini-tutorial.rst + +.. _pegen: https://github.com/we-like-parsers/pegen +.. _PEG parser: https://peps.python.org/pep-0617/ + +Declarative Translation (Deprecated) +------------------------------------ + +Translation is one of the most common tasks in language processing. +Analysis often sumarizes the parsed input, and *walkers* are good for that. +In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient. + +|TatSu| provides support for template-based code generation ("translation", see below) +in the ``tatsu.codegen`` module. +Code generation works by defining a translation class for each class in the model specified by the grammar. + +Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output, +with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used +by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the +`mini-tutorial`_ for an example. + +Basically, the code generation strategy changed from declarative with library support, to procedural, +breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure +to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``, +and ``ContextWalker``. + +**deprecated** + +|TatSu| doesn't impose a way to create translators with it, but it +exposes the facilities it uses to generate the `Python`_ source code for +parsers. + +Translation in |TatSu| was *template-based*, but instead of defining or +using a complex templating engine (yet another language), it relies on +the simple but powerful ``string.Formatter`` of the `Python`_ standard +library. The templates are simple strings that, in |TatSu|'s style, +are inlined with the code. + +To generate a parser, |TatSu| constructs an object model of the parsed +grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model +objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and +implement the translation and rendering using string templates. +Templates are left-trimmed on whitespace, like `Python`_ *doc-comments* +are. This is an example taken from |TatSu|'s source code: + +.. code:: python + + class Lookahead(ModelRenderer): + template = '''\ + with self._if(): + {exp:1::}\ + ''' + +Every *attribute* of the object that doesn't start with an underscore +(``_``) may be used as a template field, and fields can be added or +modified by overriding the ``render_fields(fields)`` method. Fields +themselves are *lazily rendered* before being expanded by the template, +so a field may be an instance of a ``ModelRenderer`` descendant. + +The ``rendering`` module defines a ``Formatter`` enhanced to support the +rendering of items in an *iterable* one by one. The syntax to achieve +that is: + +.. code:: python + + ''' + {fieldname:ind:sep:fmt} + ''' + +All of ``ind``, ``sep``, and ``fmt`` are optional, but the three +*colons* are not. A field specified that way will be rendered using: + +.. code:: python + + indent(sep.join(fmt % render(v) for v in value), ind) + +The extended format can also be used with non-iterables, in which case +the rendering will be: + +.. code:: python + + indent(fmt % render(value), ind) + +The default multiplier for ``ind`` is ``4``, but that can be overridden +using ``n*m`` (for example ``3*1``) in the format. + +**note** + Using a newline character (``\n``) as separator will interfere with + left trimming and indentation of templates. To use a newline as + separator, specify it as ``\\n``, and the renderer will understand + the intention. + diff --git a/docs/index.rst b/docs/index.rst index e50f9bf5..1152e6ec 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -47,8 +47,8 @@ input, much like the `re`_ module does with regular expressions, or it can gener semantics models asjson - print_translation translation + declarative_translation left_recursion mini-tutorial traces diff --git a/docs/print_translation.rst b/docs/print_translation.rst deleted file mode 100644 index 2d8d5ca3..00000000 --- a/docs/print_translation.rst +++ /dev/null @@ -1,41 +0,0 @@ -.. include:: links.rst - -Print Translation ------------------ - - -|TatSu| doesn't impose a way to create translators, but it -exposes the facilities it uses to generate the `Python`_ source code for -parsers. - -Translation in |TatSu| is based on subclasses of ``Walker`` and on classes that -inherit from ``IndentPrintMixin``, a strategy copied from the new PEG_ parser -in Python_ (see `PEP 617`_). - -``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager, -and should be used thus: - -.. code:: python - - class MyTranslationWalker(NodeWalker, IndentPrintMixin): - - def walk_SomeNode(self, node): - with self.indent(): - # continue walking the tree - - -The ``self.print()`` method takes note of the current level of indentation, so -output will be indented by the ``indent`` passed to -the ``IndentPrintConstructor``: - -.. code:: python - - def walk_SomeNode(self, node): - with self.indent(): - self.print(walk_expression(node.exp)) - -The printed code can be retrieved using the ``printed_text()`` method. Other -posibilities are available by assigning a text-like object to -``self.output_stream`` in the ``__init__()`` method. - -.. _PEP 617: https://peps.python.org/pep-0617/ diff --git a/docs/translation.rst b/docs/translation.rst index 192ac503..60afc1b5 100644 --- a/docs/translation.rst +++ b/docs/translation.rst @@ -1,94 +1,53 @@ .. include:: links.rst -.. _mini-tutorial: mini-tutorial.rst - -.. _pegen: https://github.com/we-like-parsers/pegen -.. _PEG parser: https://peps.python.org/pep-0617/ - -Declarative Translation ------------------------ +Translation +----------- Translation is one of the most common tasks in language processing. Analysis often sumarizes the parsed input, and *walkers* are good for that. -In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient. - -|TatSu| provides support for template-based code generation ("translation", see below) -in the ``tatsu.codegen`` module. -Code generation works by defining a translation class for each class in the model specified by the grammar. - -Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output, -with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used -by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the -`mini-tutorial`_ for an example. - -Basically, the code generation strategy changed from declarative with library support, to procedural, -breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure -to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``, -and ``ContextWalker``. -**deprecated** -|TatSu| doesn't impose a way to create translators with it, but it +|TatSu| doesn't impose a way to create translators, but it exposes the facilities it uses to generate the `Python`_ source code for parsers. -Translation in |TatSu| was *template-based*, but instead of defining or -using a complex templating engine (yet another language), it relies on -the simple but powerful ``string.Formatter`` of the `Python`_ standard -library. The templates are simple strings that, in |TatSu|'s style, -are inlined with the code. +Translation in |TatSu| is based on subclasses of ``Walker`` and on classes that +inherit from ``IndentPrintMixin``, a strategy copied from the new PEG_ parser +in Python_ (see `PEP 617`_). -To generate a parser, |TatSu| constructs an object model of the parsed -grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model -objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and -implement the translation and rendering using string templates. -Templates are left-trimmed on whitespace, like `Python`_ *doc-comments* -are. This is an example taken from |TatSu|'s source code: +``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager, +and should be used thus: .. code:: python - class Lookahead(ModelRenderer): - template = '''\ - with self._if(): - {exp:1::}\ - ''' + class MyTranslationWalker(NodeWalker, IndentPrintMixin): -Every *attribute* of the object that doesn't start with an underscore -(``_``) may be used as a template field, and fields can be added or -modified by overriding the ``render_fields(fields)`` method. Fields -themselves are *lazily rendered* before being expanded by the template, -so a field may be an instance of a ``ModelRenderer`` descendant. + def walk_SomeNode(self, node): + self.print('some preamble') + with self.indent(): + # continue walking the tree -The ``rendering`` module defines a ``Formatter`` enhanced to support the -rendering of items in an *iterable* one by one. The syntax to achieve -that is: -.. code:: python - - ''' - {fieldname:ind:sep:fmt} - ''' +The ``self.print()`` method takes note of the current level of indentation, so +output will be indented by the `indent` passed to +the ``IndentPrintMixin`` constructor, or to the ``indent(iamoun:int)`` method. +The mixin keeps as stack of the indent ammounts so it can go back to where it +was after each ``with indent(amount=n):`` statement: -All of ``ind``, ``sep``, and ``fmt`` are optional, but the three -*colons* are not. A field specified that way will be rendered using: .. code:: python - indent(sep.join(fmt % render(v) for v in value), ind) - -The extended format can also be used with non-iterables, in which case -the rendering will be: - -.. code:: python + def walk_SomeNode(self, node): + with self.indent(amount=2): + self.print(walk_expression(node.exp)) - indent(fmt % render(value), ind) +The printed code can be retrieved using the ``printed_text()`` method, but other +posibilities are available by assigning a text-like object to +``self.output_stream`` in the ``__init__()`` method. -The default multiplier for ``ind`` is ``4``, but that can be overridden -using ``n*m`` (for example ``3*1``) in the format. +A good example of how to do code generation with a ``NodeWalker`` is |TatSu|'s own +code generator, which can be found in ``tatsu/ngcodegen/python.py``, or the model +generation found in ``tatsu/ngcodegen/objectomdel.py``. -**note** - Using a newline character (``\n``) as separator will interfere with - left trimming and indentation of templates. To use a newline as - separator, specify it as ``\\n``, and the renderer will understand - the intention. +.. _PEP 617: https://peps.python.org/pep-0617/