As a programmer, you will often need to process data that is been continuously produced by one or more sources. For example, you might need to process logs generated by a running system. Data here is not static, logs are continuously generated and continuously processed. This is a Data Stream. In this series, we are going to understand how to handle data streams in Golang.

Data Streams

As already introduced, data streams are continuously flowing packets of data from one or more multiple sources.
Suppose you have to read a file. Now there are two cases :

  1. The file is dynamic (new content is added to it while you are reading).
    In this case, you make a stream with the file as the source, and consume data as it flows through the stream.
  2. The file is static. ( no new content is added while you are reading)
    This case can be further classified into two cases.
    a. Small File Size: In this case, you can read the entire content of the file in RAM.b. Large File Size (Comparable to free memory in RAM): In this case, since the size of the file is more, you cannot read the entire file at once, you create a stream, and read buffer by buffer.

Buffer

Buffer is a temporary region of RAM where data is stored while reading. In the case of a simple read, the entire content is loaded into the buffer (because the size of the document is small enough to accommodate in RAM). In the case of streams, we take a chunk of data of size equal to buffer, store it in RAM and process it, after this, we again read a chunk, and store it in RAM and process it.

Golang comes in with a lot of APIs that support streaming (reading and writing) from multiple resources like network connections (used to make Network calls), in-memory data structures and files stored on the hard disk.

Let us now focus on creating Go programmes that are capable of streaming data. To make lives easy, Golang provide two interfaces io.Reader and io.Writer.

io.Reader

In technical terms, io.Reader is an interface that reads data from a data source in form of bytes into a buffer .  Once the data is in byte form in the buffer, it can be consumed, or transferred.

As mentioned earlier, io.Reader is an interface. It does not provide a concrete implementation. It just exposes a function that you have to implement in order to implement the interface.

type Reader interface {
Read(p []byte) (n int, err error)
}

In order to implement the interface, we must implement the function Read(p []byte).

This function returns the number of bytes that it has read, and in case some error occurs, it sends the error.

 

Guidelines for using io.Reader :

  1. Read function is passed a buffer of length(p). Thus the function will try to read the first p bytes from the data source whenever possible.
  2. After some reads, the number of bytes left to be read can be less than p. In that case, n would be less than n.
  3. When an error occurs, there can be 2 scenarios, either the reader was able to read p bytes and then an error occurred, in which case it will return the n=p and the error or the reader was not able to read anything at all and the error occurred, in which case n=0 will be returned along with an error.
  4. When the whole source has been read, the reader should return io.EOF the error.
  5. n=0 and err!=nil does not mean the source has been exhausted, it might be possible for the reader to return some data on subsequent calls.

 

 

Many different Libraries in Golang provide the implementation of Reader interface. Let us look at some of them.

Using Reader

Many libraries provide an implementation of reader an interface, to read data streams in Golang. Using the reader interface, we run the read function in a loop over the data source. In every iteration, it reads a chunk of data into the buffer p.  The loop ends when the source is fully exhausted (or read). In the end, the function returns an io.EOF error.

Strings package exposes a reader, let us see how it looks.

func main() {

	reader := strings.NewReader("A for apple, B for ball")

	p := make([]byte, 4)

	for {

		n, err := reader.Read(p)

		if err != nil {

                   if err == io.EOF {

                       fmt.Println(string(p[:n]))
 
                       break

                    }

		   fmt.Println(err)
os.Exit(1)

		}

		fmt.Println(string(p[:n]))

	}

}

 

In this code, we make a buffer of size 4. We iteratively call the reader.read function, unless it returns an error. We check the error, if it is io.EOF, we print any bytes that the function might have returned (using n  , number of bytes read by the call) else we print the error and exit.

If no error, we print the number of bytes read using the buffer and move forward to the next iteration.

 

This is how we read Data Streams in Golang. We will discuss writers in the next article. You can read more Golang related articles here.

 

 


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *