Skip to content
Dan Debrunner edited this page Apr 11, 2014 · 4 revisions

Writing Effective SPLDOC

Why

SPLDOC for a toolkit documents the behaviour of its operators, functions and types. Toolkits should be well documented using SPLDOC for these reasons:

  1. SPL developers can understand how to benefit from the toolkit to make themselves more productive, by using its functionality. If a toolkit is hard to understand or poorly documented it is more likely it will not be used and the developer will end up writing duplicate functionality.
  2. SPL developers won't have to run experiments with the toolkit to see what happens, or dig into the source code to try and figure out what the toolkit does.
  3. Reviewing the code is easier when there's a clear description of each operator, function and type. If it's not documented what an operator does, how can the code be reviewed for correctness.
  4. Other open source contributors can understand what the toolkit is meant to do and either fix issues or enhance it. If it's not well documented then there can be no complaints when it's modified to do something different because the original intention was not clear.

Start with SPLDOC

Rather than view documenting the toolkit as an afterthought to keep a reviewer happy or to get a pull request accepted, consider starting with documenting the toolkit using SPLDOC. SPLDOC defines the api or contract for the toolkit, thus by creating it first it then ensures that any development has a clear specification it can follow. For example, define the data model (see below) before writing any code, and then implement that data model, rather than trying to describe a data model from a code implementation that may be inconsistent due to lack of up front specification.

For C++ primitive operators create the operator and then spend the time to complete the operator model before working on the C++ implementation. Define & describe the operator itself, parameters, ports, custom literals, output functions etc.

For Java primitive operators define the operator's api by using the Java annotations to define & describe the operator itself, parameters, ports, etc. With Java one can separate out the operator's api from its implementation by having the API specified in an abstract super-class, and the implementation as a concrete sub-class.

For SPL composite operators you can create the skeleton of the composite defining its parameters and input & output ports, and then use SPLDOC to define the behavior.

For any function create the function with an empty body and document its behavior before implementing it.

Operators

Describe what the operator does in detail, think about edge cases as well as the common case. These items are especially important:

  • When does the operator submit a tuple or punctuation mark, including details of what drives each individual tuple.
    • Example: JDBCSelect submits a tuple for each row returned from the SQL SELECT statement.
    • Example: Functor submits a tuple for each input tuple if evaluation of the filter condition is true.
  • What is the content of the submitted tuple, including explicitly stating the values for attributes not explicitly set.
    • Example: the submitted output tuple attributes are set from parameters in the HTTP POST body. Any named parameter in he POST body is used to set the tuple attribute with an identical name. Any parameter without a matching attribute is ignored. Attributes without matching parameters are set to their SPL default.
  • When integrating with an external system or embedded library, explicitly define the data model. This is a description of how tuples are converted and from the non-Streams representation, e.g. how tuples are converted into rows in a database, or from a HTTP POST body into a tuple. This will include detailed information on type conversion.
    • This might require an explicit table showing type mapping, e.g. SQL TIME is converted to a SPL timestamp with the date portion set to the current day.
  • How does the operator process incoming tuples and punctuation. Be explicit on how each tuple is handled, what actions are being performed for each tuple.
    • Example: For each input tuple JDBCInsert inserts a row in the database table. The contents of the row are set from the tuple's attributes as follows ...

Continued Improvement

A toolkit's documentation most likely can always be improved, to add more detail or clarifications. These are indications the SPLDOC can be improved:

  • Questions from users on how to use an operator or function