Skip to content

Commit

Permalink
[docs] deprecate declarative translation
Browse files Browse the repository at this point in the history
  • Loading branch information
apalala committed Dec 9, 2023
1 parent 1d1d059 commit 58e73aa
Show file tree
Hide file tree
Showing 4 changed files with 123 additions and 111 deletions.
94 changes: 94 additions & 0 deletions docs/declarative_translation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
.. include:: links.rst

.. _mini-tutorial: mini-tutorial.rst

.. _pegen: https://github.com/we-like-parsers/pegen
.. _PEG parser: https://peps.python.org/pep-0617/

Declarative Translation (Deprecated)
------------------------------------

Translation is one of the most common tasks in language processing.
Analysis often sumarizes the parsed input, and *walkers* are good for that.
In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient.

|TatSu| provides support for template-based code generation ("translation", see below)
in the ``tatsu.codegen`` module.
Code generation works by defining a translation class for each class in the model specified by the grammar.

Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output,
with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used
by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the
`mini-tutorial`_ for an example.

Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure
to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``,
and ``ContextWalker``.

**deprecated**

|TatSu| doesn't impose a way to create translators with it, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.

Translation in |TatSu| was *template-based*, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful ``string.Formatter`` of the `Python`_ standard
library. The templates are simple strings that, in |TatSu|'s style,
are inlined with the code.

To generate a parser, |TatSu| constructs an object model of the parsed
grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model
objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like `Python`_ *doc-comments*
are. This is an example taken from |TatSu|'s source code:

.. code:: python
class Lookahead(ModelRenderer):
template = '''\
with self._if():
{exp:1::}\
'''
Every *attribute* of the object that doesn't start with an underscore
(``_``) may be used as a template field, and fields can be added or
modified by overriding the ``render_fields(fields)`` method. Fields
themselves are *lazily rendered* before being expanded by the template,
so a field may be an instance of a ``ModelRenderer`` descendant.

The ``rendering`` module defines a ``Formatter`` enhanced to support the
rendering of items in an *iterable* one by one. The syntax to achieve
that is:

.. code:: python
'''
{fieldname:ind:sep:fmt}
'''
All of ``ind``, ``sep``, and ``fmt`` are optional, but the three
*colons* are not. A field specified that way will be rendered using:

.. code:: python
indent(sep.join(fmt % render(v) for v in value), ind)
The extended format can also be used with non-iterables, in which case
the rendering will be:

.. code:: python
indent(fmt % render(value), ind)
The default multiplier for ``ind`` is ``4``, but that can be overridden
using ``n*m`` (for example ``3*1``) in the format.

**note**
Using a newline character (``\n``) as separator will interfere with
left trimming and indentation of templates. To use a newline as
separator, specify it as ``\\n``, and the renderer will understand
the intention.

2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ input, much like the `re`_ module does with regular expressions, or it can gener
semantics
models
asjson
print_translation
translation
declarative_translation
left_recursion
mini-tutorial
traces
Expand Down
41 changes: 0 additions & 41 deletions docs/print_translation.rst

This file was deleted.

97 changes: 28 additions & 69 deletions docs/translation.rst
Original file line number Diff line number Diff line change
@@ -1,94 +1,53 @@
.. include:: links.rst

.. _mini-tutorial: mini-tutorial.rst

.. _pegen: https://github.com/we-like-parsers/pegen
.. _PEG parser: https://peps.python.org/pep-0617/

Declarative Translation
-----------------------
Translation
-----------

Translation is one of the most common tasks in language processing.
Analysis often sumarizes the parsed input, and *walkers* are good for that.
In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient.

|TatSu| provides support for template-based code generation ("translation", see below)
in the ``tatsu.codegen`` module.
Code generation works by defining a translation class for each class in the model specified by the grammar.

Nowadays the preferred code generation strategy is to walk down the AST_ and `print()` the desired output,
with the help of the ``NodWalker`` class, and the ``IndentPrintMixin`` mixin. That's the strategy used
by pegen_, the precursor to the new `PEG parser`_ in Python_. Please take a lookt at the
`mini-tutorial`_ for an example.

Basically, the code generation strategy changed from declarative with library support, to procedural,
breadth or depth first, using only standard Python_. The procedural code must know the AST_ structure
to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``,
and ``ContextWalker``.

**deprecated**

|TatSu| doesn't impose a way to create translators with it, but it
|TatSu| doesn't impose a way to create translators, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.

Translation in |TatSu| was *template-based*, but instead of defining or
using a complex templating engine (yet another language), it relies on
the simple but powerful ``string.Formatter`` of the `Python`_ standard
library. The templates are simple strings that, in |TatSu|'s style,
are inlined with the code.
Translation in |TatSu| is based on subclasses of ``Walker`` and on classes that
inherit from ``IndentPrintMixin``, a strategy copied from the new PEG_ parser
in Python_ (see `PEP 617`_).

To generate a parser, |TatSu| constructs an object model of the parsed
grammar. A ``tatsu.codegen.CodeGenerator`` instance matches model
objects to classes that descend from ``tatsu.codegen.ModelRenderer`` and
implement the translation and rendering using string templates.
Templates are left-trimmed on whitespace, like `Python`_ *doc-comments*
are. This is an example taken from |TatSu|'s source code:
``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager,
and should be used thus:

.. code:: python
class Lookahead(ModelRenderer):
template = '''\
with self._if():
{exp:1::}\
'''
class MyTranslationWalker(NodeWalker, IndentPrintMixin):
Every *attribute* of the object that doesn't start with an underscore
(``_``) may be used as a template field, and fields can be added or
modified by overriding the ``render_fields(fields)`` method. Fields
themselves are *lazily rendered* before being expanded by the template,
so a field may be an instance of a ``ModelRenderer`` descendant.
def walk_SomeNode(self, node):
self.print('some preamble')
with self.indent():
# continue walking the tree
The ``rendering`` module defines a ``Formatter`` enhanced to support the
rendering of items in an *iterable* one by one. The syntax to achieve
that is:
.. code:: python
'''
{fieldname:ind:sep:fmt}
'''
The ``self.print()`` method takes note of the current level of indentation, so
output will be indented by the `indent` passed to
the ``IndentPrintMixin`` constructor, or to the ``indent(iamoun:int)`` method.
The mixin keeps as stack of the indent ammounts so it can go back to where it
was after each ``with indent(amount=n):`` statement:

All of ``ind``, ``sep``, and ``fmt`` are optional, but the three
*colons* are not. A field specified that way will be rendered using:

.. code:: python
indent(sep.join(fmt % render(v) for v in value), ind)
The extended format can also be used with non-iterables, in which case
the rendering will be:

.. code:: python
def walk_SomeNode(self, node):
with self.indent(amount=2):
self.print(walk_expression(node.exp))
indent(fmt % render(value), ind)
The printed code can be retrieved using the ``printed_text()`` method, but other
posibilities are available by assigning a text-like object to
``self.output_stream`` in the ``__init__()`` method.

The default multiplier for ``ind`` is ``4``, but that can be overridden
using ``n*m`` (for example ``3*1``) in the format.
A good example of how to do code generation with a ``NodeWalker`` is |TatSu|'s own
code generator, which can be found in ``tatsu/ngcodegen/python.py``, or the model
generation found in ``tatsu/ngcodegen/objectomdel.py``.

**note**
Using a newline character (``\n``) as separator will interfere with
left trimming and indentation of templates. To use a newline as
separator, specify it as ``\\n``, and the renderer will understand
the intention.

.. _PEP 617: https://peps.python.org/pep-0617/

0 comments on commit 58e73aa

Please sign in to comment.