Skip to content

IBMStreams repository guidelines

Samantha Chan edited this page Apr 6, 2016 · 8 revisions

When you create a new repository in https://github.com/IBMStreams, you are creating a place where you can share source code with other developers to extend IBM InfoSphere Streams. To help others find your source code, and find their way around inside it, we suggest some guidelines here for naming and organizing your files and directories.

Note: This page is a work in progress. Comments, criticisms, additions, and alternatives are all welcome. Please contribute to the discussion.


some terminology used in these guidelines

These guidelines use a few keywords to mean similar but distinct layers of organization. To be clear, this is what we mean:

  • an IBMStreams repository is the top level of organization for your source code -- the 'root' for your directories and files. All of the IBMStreams repositories are listed in its 'table of contents' at https://github.com/IBMStreams. Some repositories contain a toolkit of SPL functions and operators that Streams developers can use when composing their own applications. Other repositories contain a complete Streams application that demonstrates how to solve a problem. When other github developers clone your repository, everything in it will be copied to their machine, and github will keep track of changes between their copy and yours. For example, streamsx.inet and streamsx.messaging are IBMStreams repositories.

  • a Streams toolkit is a repository containing source code for a collection of related SPL types, functions, operators, and documentation that Streams developers can use when composing their applications. You should place all of that source code in one SPL project, and place that project in the 'root' of your repository. Your toolkit may also have projects containing sample applications, test data, or other types of source code. For example, com.ibm.streamsx.inet is a toolkit within a repository.

  • a Streams demo is a repository containing source code for a complete application that demonstrates how to solve a problem with Streams. Your demo repository should contain at least one SPL project, and may have many more. It may also contain other projects containing C/C++ source code, Java source code, documentation, and data needed to complete the demo. Your demo may depend upon one or more Streams toolkits that users must obtain and install separately. For example, NetworkTrafficDemo is a demo repository.

  • an Eclipse project is a directory containing a particular type of source code that Streams Studio can compile and execute. Most projects contain SPL, C/C++, and/or Java source code. Your repository can contain many projects of different types. When Streams developers download a distribution package containing your repository into Streams Studio, they can choose which projects are unpacked into their Eclipse workspace. For example, com.ibm.streamsx.inet is an SPL project (that is an SPL toolkit).

  • an SPL namespace is a string that uniquely identifies a set of related source code files. Streams uses namespaces to identify the collection of functions and operators in a toolkit, and the main composite of a demo application. Streams Studio uses namespaces to name the subdirectories containing source files in SPL projects. You should choose a unique namespace for the toolkit or demo in your repository to ensure that the names of your operators or composites don't conflict with someone else's. IBMStreams follows the Java convention of hierarchical namespaces, which makes it easy to choose unique namespaces without a central registry. Within IBMStreams, namespaces (and Java packages) should start with com.ibm.streamsx., for example, com.ibm.streamsx.inet and com.ibm.streamsx.messaging are the namespaces for the Inet toolkit and Messaging toolkit, respectively.


naming your repository

The name of your repository will appear in the IBMStreams 'table of contents' at https://github.com/IBMStreams. It will be used in the URL for your repository, in the path of its directories and files, and in the distribution package that Streams developers download into Streams Studio. The name will generally be selected during the discussion for adding the toolkit and its repository.

For example, a toolkit of operators for messaging is in the repository named streamsx.messaging. The repository's home page is at https://github.com/IBMStreams/streamsx.messaging.


these top-level files are required

When the IBMStreams administrators create a new repository, it will be empty except for these files at its top level, that is, in its 'root' directory:

  • The LICENSE.md file states that the contents of the repository can be distributed and used under the terms of the Apache License, version 2.0, and it contains a link to the full text of the license. You should not alter this file, and you should copy it into each SPL project you add to your repository.

  • The README.md file should describe what the repository contains, how it is organized, and what dependencies it has. This file will be displayed on the repository's home page. Initially, it will contain only some skeletal text. You should replace that with details other Streams developers will need to use the repository. You can use 'markdown' to format the file's contents for display.

  • the .gitignore file specifies files and directories that should not be uploaded to IBMStreams when you synchronize your repository, that is, when you push your changes back to IBMStreams. Initially, it will exclude temporary files and directories commonly associated with Streams development. If your work yields other temporary files that should not be uploaded to IBMStreams, you should add their names here.

Work on a repository usually takes place in your own fork that you create, though if you are a committer you may also work directly against the IBMStreamsx repository. You will want to immediately 'clone' the new repository, that is, make a copy of it on the machine where you will develop the toolkit or demo application. You can use the Eclipse Git tool (EGit) to do this from within Streams Studio. See the instructions for using IBMStreams and Streams Studio.

As an example, lets assume that the IBMStreams administrators have created a new repository named streamsx.jdbc. You can see the new repository's home page at https://github.com/IBMStreams/streamsx.jdbc. Note that the README.md file is prominently displayed there.

To clone the new repository into Streams Studio, first copy its 'clone URL' by clicking this button on your repository's home page:

image

Then clone the repository into Streams Studio by clicking this button in the 'Git Repositories' view of Eclipse:

image

After cloning the repository into Streams Studio, you will see the required files in the 'Git Repositories' view of Eclipse, like this:

image

This would be a good time to edit your README.md file and replace its contents with a description of what the repository will contain. If you are not familiar with 'markdown', skim the basics first. If you are, review the dialect used here.


naming directories and files in toolkit repositories

A IBMStreamsx repository should have at least one directory in its 'root' that is an SPL project. For toolkit repositories, we suggest that you put all of your types, functions, and operators in one SPL project, and use your namespace as the project's name.

Continuing our example, lets assume that you want to create a new operator named JDBCInsert. In that case, we suggest that you begin by creating an SPL project named com.ibm.streamsx.jdbc. After creating the com.ibm.streamsx.jdbc namespace (to be consistent with the repository name) and operator, your project should contain these files and subdirectories:

  • the com.ibm.streamsx.jdbc directory itself, which will contain the SPL project for the toolkit's types, functions, operators, documentation, and sample applications.

  • the com.ibm.streamsx.jdbc/.project file, created automatically by Eclipse, which identifies the directory as an Eclipse project containing SPL source code.

  • the com.ibm.streamsx.jdbc/info.xml file, created automatically by Streams Studio, which identifies the SPL types, functions, and operators in the toolkit.

  • the com.ibm.streamsx.jdbc/LICENSE.md, a copy of the LICENSE file from the 'root' directory of the repository. You should copy the license file into each project you create. It will then be imported into your users' workspaces, along with the source code,.

  • the com.ibm.streamsx.jdbc/com.ibm.streamsx.jdbc subdirectory, created automatically by Streams Studio when you create the namespace com.ibm.streamsx.jdbc, which will contain the source code for the toolkit's types, functions, and operators. Note within the toolkit you are free to use any namespace under com.ibm.streamsx.jdbc, some toolkits will use multiple sub-namespaces to separate out functionality.

  • the com.ibm.streamsx.jdbc/com.ibm.streamsx.jdbc/JDBCInsert subdirectory, created automatically by Streams Studio when you create the JDBCInsert C++ primitive operator, which will contain the source code for the operator.

  • the com.ibm.streamsx.jdbc/com.ibm.streamsx.jdbc/JDBCInsert/JDBCInsert.xml file, created automatically by Streams Studio when you create the JDBCInsert C++ primitive operator, which describes the operator's input and output ports, parameters, window options, and library dependencies.

  • the com.ibm.streamsx.jdbc/doc subdirectory, which we suggest you create manually for the toolkit's documentation. SPLDOC (spl-make-doc command) can be used to create documentation for the toolkit from comments in the source, similar to Javadoc for Java code.

You will see these files and directories (and many more) in the 'Project Explorer' view of Eclipse, like this:

image

(Please note that our example is not complete. The SPL project will contain lots of other subdirectories and files.)

To add your new project to the repository, right-click it in the 'Project Explorer' view, select 'Team > Share Project...', like this:

image

Then, in the 'Configure Git Repository' dialog, select your repository in the 'Repository' field, like this:

image

When you click 'Finish' in this dialog, Eclipse will move your project's files from your Eclipse workspace directory (the 'Current Location' in the dialog) to your cloned repository directory (the 'Target Location' in the dialog). You won't notice this move in the 'Project Explorer' view, but you will when you look at your directories with a Linux file explorer such as Nautilus.

Now develop the new operator. As you add more files and directories to your SPL project, they will be stored in the cloned repository's directory on your machine. Eclipse will add small 'question marks' to their icons in the 'Project Explorer' view to remind you that they have not yet been committed your cloned repository, like this:

image

You can commit your new directories and files to your cloned repository by clicking 'Team > Commit...' on their pop-up menus in the 'Project Explorer' view, like this:

image

As you develop your operator, Eclipse will remind you of uncommitted changes by adding 'greater than' symbols to their icons, like this:

image

Eventually, you will want to push the changes you have committed to the cloned repository on your machine back up to IBMStreams. You can do this by clicking 'Team > Push to Upstream' on the pop-up menu in the 'Project Explorer' view, like this:

image

Later, when you are collaborating with other developers, you can pull their changes from IBMStreams down into the cloned repository on your machine by clicking 'Team > Pull from Upstream' on the pop-up menu in the 'Project Explorer' view, like this:

image

You'll find more than you ever wanted to know about the Eclipse EGit tool in the EGit User's Guide and EGit Tutorial. For details on 'git' itself, dive into the Pro Git book.


sample applications

As you develop new functionality within a toolkit, you will undoubtedly also develop some small SPL applications for your own testing. Those test applications will be helpful to other developers who will use your operator. We encourage you to include them in the repository as sample applications.

We suggest that you create additional SPL projects for sample applications, separate from the SPL project that contains the toolkit's functions and operators. When SPL developers download the toolkit into their workspaces, Eclipse will allow them to choose which projects projects to import, and which to skip.

Continuing our previous example, let's assume that you have developed a sample application in an SPL project named JDBCInsertSample in the namespace com.ibm.streamsx.jdbc.sample. When you add this project to the repository, enter 'samples' in the 'Path within repository' field of the 'Configure Git Repository' dialog, like this:

image

When you click 'Finish' on this dialog, Eclipse will move the directory containing the test project into the 'samples' directory of the repository's 'root'. As with the toolkit project, you won't notice this change in the 'Project Explorer' view, but you will see it in the 'Git Repositories' view, like this:

image


a word about version numbers

Each SPL project has a version number composed of three digits, which the Streams compiler uses to ensure that all of the dependencies of an application are compatible. Streams Studio displays version numbers in brackets after each project's name in the 'Project Explorer' view, like this:

image

You can set the version number of your toolkit by selecting 'Edit Toolkit Information...' from its pop-up menu in the 'Project Explorer' view and changing the 'Toolkit version' field, like this:

image

Discussion for versioning is in Issue 5.

The version number is set to 1.0.0 automatically when you create a new SPL project. We suggest that you follow this convention when assigning the digits of your toolkit's version number:

  • increment the first digit of the version number when you make changes that are incompatible with the previous version, and will break applications that depend upon your code. For example, if you change the name or namespace of your operator, or add a required parameter to an operator, or delete a deprecated function, then your users will need to change their applications when they upgrade to your new version. A change in the first digit is a cue that this version may require some rework on their part.

  • increment the second digit whenever you add new capabilities that are compatible with previous versions. For example, if you add a new function or operator to your toolkit, or add an optional parameter to an old operator with a default value that does not change the operator's behavior, then your users will not need to change their applications when they upgrade to your new version, but they may want to, to take advantage of its new capabilities. A change in the second digit is a cue to review its documentation.

  • increment the third digit whenever you make changes that don't require any attention from your users. For example, if you fix a bug in a function, or improve the performance of an operator, then your users can upgrade to your new version without revising their code or reviewing your documentation.

This convention follows the OSGi Semantic Versioning definitions (see page 7), slightly modified for SPL project context.

SPL projects containing sample applications have their own version numbers, independent of the SPL project containing the toolkit's functions and operators. Their versions should follow the same convention.


demo repositories

to be done ..............