Flow-based Programming (Part 1)

Flow-based programming is a way of coding that’s slightly different from imperative programming languages. Its peculiarities let us avoid some problems that are typical to imperative languages. In this article we’ll look through basic concepts, features and strengths of flaw-based programming, and how we can use them to our advantage

Introduction

Flow-Based Programming (FPB) is programming paradigm created by Paul Morrison in the end of 1960s. FBP describes program as a directed graph (network). Nodes of the network are called processes. They can receive, process and send information through the network in from of Information Packets (IPs). IPs are transferred from process to process through directed arcs of the graph called connections.

FBP is somewhat similar to conveyor production of some item. Before its completion, each item should go through a certain set of treatment procedures, which are being undertaken simultaneously and independently. The item itself can be compared to IP. Each processing operation is performed by a single worker or device - it’s an FPB process. In between operations the item is being transferred by a conveyor belt, which is similar to FBP connection.

Just as at the factory, the main attention is given to the product. It means that the paramount focus in FBP is data and transformations it endures.

FBP is characterized by:

parallelism
asynchrony
multithreading
code reuse
simplicity in testing
repairability
natural visual representation
joint development

Bash-flow

Prime examples of FBP are Unix pipelines. With their help, for instance, we can count the number of errors in log file of a server:

cat server.log | grep "^ERROR" | wc -l

where:

data being transferred through the pipeline (bites or file’s lines of code) are IPs;
commands are components;
unix-processes are processes;
stdin, stdout are ports;
pipeline is a connection.

What FBP is made of?

Information Packets (IPs)

In imperative programming languages we use variables for transmitting and storing data. They’re like a data storage area. Working with variables allows us to:

read data from a variable as many times as needed and get same data each time (without reporting you did it before);
write data in a variable over the previous, in so doing, losing previous data
between reading and writing data in a variable by one process, other process may change data in the variable without reporting it to the first process.

The way the variable behaves leads to errors in development. It requires complex mechanisms of blocking and synchronization of data during parallel computations.

Let’s compare this to how real-life objects behave - to give you an idea, let’s think of variable as a bookshelf, and data as books, which are stored on this bookshelf:

If a book is on the shelf, you can take it from it only once without putting it back.
To put the book back, you have to make sure that the shelf is empty - and clear it out if it’s not.
If you took the book, no one else can take it till you put it back.

It’s clear that the variable and data inside it behave in a completely different way.

FBP refuses the concept of variable. Data in FBP is transmitted in form of Information Packets (IPs). IPs are supposed to be worked with as real objects. IP is not a data storage area - it’s a real objects’ descriptor.

IP should comply with the following properties:

Before usage IP should be created explicitly;
At any moment in time IP may belong to only one process or be in a middle of transferring from one process to another (in other words, it must remain in a connection).
To leave the system, it must be destroyed explicitly.

Component

Component is analogue of subroutine, and it’s a main building block of FBP. Component describes how future processes will work in a network. If you are familiar with OOP, then you can draw a parallel between component and class. Process, which is an instance of this component, can be compared to an instance of this class. Components are commonly written in high-level languages.

Process

Process is an instance of a component located in network.

Process is able to:

receive IP;
create IP;
read data from IP;
change data in IP;
send IP;
delete IP;

FBP program is actually a set of processes with IP’s being transferred between them. All processes are to work concurrently and independently. It’s possible to run multiple processes for a single component.

Before finishing its work, a process has to get rid of all IPs it holds, by passing them further through the network or by explicitly destroying them. We say that process holds IP, when it received IP or created it, but hasn’t sent it further or destroyed it yet.

Ports

We’ve already mentioned that process can receive or send IP. But where does it receive IPs from? And where does it send them? Let’s recall that work of a process is described in a component. It doesn’t have access to information about network. That’s why a process can’t receive IPs directly from other processes or connections. For this task we need a special interface that will allow process to interact with a network. This interfaces are called ports.

Process may have ports of two types:

input port, that allows process to receive IPs;
output port, that lets process send IPs.

Process may have different combinations of ports. The way ports interact with the process may also wary. These attributes are described in a component. Depending on a way you arrange it, the following options are possible:

1. A component with one input port and one output port. In this case each of the ports may be either single port (like an output port on a picture on the right), or an array port (like an input port on the same picture).

With a single port all you need is to specify its type. As for an array port - it has several indexed elements, and each of them can be seen as a single port. For an array port you should specify its type and element’s index.

2. A component with multiple input and output ports. In this case, each port should have its own unique name so that the component could distinguish them.

Connections

Now we know how a process receives and sends IPs with a help from ports. Now there’s the issue of how an IP send by one process may be received by another process? There has to be a link between an output port of a sending process and an input port of a receiving process. This link is called a connection.

Connection is a limited buffer or a FIFO-queue. Size of a buffer or a maximum number of IPs that can be held in a queue is called connection’s capacity.

Each connection has two ends:

Input end - through it IPs get inside the connection. It has to be connected to an output port of process;
Output end - they work as an “exit” through which IPs leave connection. It needs to be linked with an input port of process.

When process sends IP, it gets into connection. The connection stores it until the receiving process accepts this IP.

Process may be blocked in two situations: if it tries to read data out of empty connection, or if it tries to write data in a full connection. To ensure that data processing goes right, connections should comply with the following constraints:

“flow”. This constraint states that each IP that was send via connection, must be delivered;
“order-preserving”. Any two different IPs sent through an input of connection in a certain order must be received on an output of this connection in the same order.

To sum it up, when we want to send an IP from one process to another, we should establish a connection between them. We also need to explicitly determine, that the output port of a sending process is linked to an input port of the receiving process by a connection with a capacity of “n”. One port may be wired only to a single connection.

However, a connection may have multiple input endings and only one output ending. If it’s so, on the output ending IPs will be received in a FIFO order.

In some situations you might want to use connections with zero capacity. In this kind of systems processes don’t have an ability to acquire IPs on their own. Just as an IP is being sent on a port of the process, the application autonomously causes execution of that process.

Composite components

composite-components

Components are a great mechanism of reusing code, but usually they are written in high-level languages. Practically, we don’t have this option on FBP-level. It’s like using FDK function without an opportunity to create your own.

The approach of composite components is a way to combine a subgraph of a network. Further on it will act like a simple component. This gives us a tool to develop components and allows us to reuse code on a level of FBP.

To make a composite component look like other components, it must have its own ports. Any port of composite component has to be connected to a corresponding port inside on a subgraph.

Steams

A stream is a series of IPs passing through a connection. A stream doesn’t exist at any particular time. It’s being generated continuously by one process and being consumed by the other. If we observe the network, we will see only a part of the stream, which consists of IPs that stay in a connection right in that moment. This concept lets us process a very long stream of data while involving a rational volume of resources.

Merging Streams

Multiple streams can be united by wiring them to one connection with several input endings.

Splitting Streams

Stream can be splitted in two or even more. It’s done by a special component called splitter. Splitter receives a source stream and sends each of its IPs to one of its output ports under some rule, thus dividing the IPs in two new streams. It may also copy IPs and send there copies to an output port, creating a copy of a source stream.

Conclusion

Now let’s have a look at the characteristics, we described back in the introduction to this article, and see how these characteristics are ensured by our components:

Paralellism. Processes work concurrently;
Asynchrony. Processes work concurrently and independently, and that’s why when process A sends an IP to process B, it doesn’t wait, till process B recieves and processes that IP.
Multithreading. Processes may work in multiple streams. FBP supports multithreading concept, but doesn’t strictly demand it, so it’s up to developer to use it in her code or not.
Code reuse. In FBP network developer may reuse components and create new components basing on the existing ones. You may build different networks using same components. Or, say, to create new applications without changing the components themself.
Simplicity in testing. Work of a process doesn’t depend on network’s condition or other processes. It only depends on incoming IPs and their order. It lets us test any component independently.
Repairability. If any process works incorrectly, we can solve this problem by fixing a relevant component or even replacing it. It doesn’t affect other elements of the network.
Natural visual representation. We can picture the whole structure of an FBP program in a form of a graph, which makes it quite human readable.
Joint development. There are two types of FBP coders - components developers and network configurators. You may develop components independently.

Implementations of FBP

JavaFBP - suggested by Paul Morrison himself, it’s a classic implementation of FBP. He also designed a visualization system DrawFBP.

Apache NiFi - Another classic implementation of FBP - this time from Apache Software Foundation. It’s written on Java. Comes in a form of a web server and has a grahic IDE. Horizontal scaling available!

https://nifi.apache.org/

Node-Red - FBP-like system designed by IBM and written in node.js. Delivered as a web server. It has a handy IDE for constructing FBP networks right out-of-the-box.

https://nodered.org/

NoFlo - one more FBP-inspired system written in node.js.

https://github.com/noflo/noflo/

Flow-based Programming (Part 1)

Introduction