As a programmer, you will often need to process data that is been continuously produced by one or more sources. For example, you might need to process logs generated by a running system. Data here is not static, logs are continuously generated and continuously processed. This is a Data Stream. In this series, we are going to understand how to handle data streams in Golang.
Data Streams
As already introduced, data streams are continuously flowing packets of data from one or more multiple sources.
Suppose you have to read a file. Now there are two cases :
- The file is dynamic (new content is added to it while you are reading).
In this case, you make a stream with the file as the source, and consume data as it flows through the stream. - The file is static. ( no new content is added while you are reading)
This case can be further classified into two cases.
a. Small File Size: In this case, you can read the entire content of the file in RAM.b. Large File Size (Comparable to free memory in RAM): In this case, since the size of the file is more, you cannot read the entire file at once, you create a stream, and read buffer by buffer.
Buffer
Buffer is a temporary region of RAM where data is stored while reading. In the case of a simple read, the entire content is loaded into the buffer (because the size of the document is small enough to accommodate in RAM). In the case of streams, we take a chunk of data of size equal to buffer, store it in RAM and process it, after this, we again read a chunk, and store it in RAM and process it.
Golang comes in with a lot of APIs that support streaming (reading and writing) from multiple resources like network connections (used to make Network calls), in-memory data structures and files stored on the hard disk.
Let us now focus on creating Go programmes that are capable of streaming data. To make lives easy, Golang provide two interfaces io.Reader
and io.Writer
.
io.Reader
In technical terms, io.Reader
is an interface that reads data from a data source in form of bytes into a buffer
. Once the data is in byte form in the buffer, it can be consumed, or transferred.
As mentioned earlier, io.Reader
is an interface. It does not provide a concrete implementation. It just exposes a function that you have to implement in order to implement the interface.
type Reader interface { Read(p []byte) (n int, err error) }
In order to implement the interface, we must implement the function Read(p []byte)
.
This function returns the number of bytes that it has read, and in case some error occurs, it sends the error.
Guidelines for using io.Reader
:
- Read function is passed a buffer of
length(p)
. Thus the function will try to read the firstp bytes
from the data source whenever possible. - After some reads, the number of bytes left to be read can be less than
p
. In that case,n
would be less than n. - When an error occurs, there can be 2 scenarios, either the reader was able to read
p
bytes and then an error occurred, in which case it will return then=p
and the error or the reader was not able to read anything at all and the error occurred, in which casen=0
will be returned along with an error. - When the whole source has been read, the reader should return
io.EOF
the error. n=0
anderr!=nil
does not mean the source has been exhausted, it might be possible for the reader to return some data on subsequent calls.
Many different Libraries in Golang provide the implementation of Reader
interface. Let us look at some of them.
Using Reader
Many libraries provide an implementation of reader
an interface, to read data streams in Golang. Using the reader
interface, we run the read
function in a loop over the data source. In every iteration, it reads a chunk of data into the buffer p
. The loop ends when the source is fully exhausted (or read). In the end, the function returns an io.EOF
error.
Strings
package exposes a reader, let us see how it looks.
func main() { reader := strings.NewReader("A for apple, B for ball") p := make([]byte, 4) for { n, err := reader.Read(p) if err != nil { if err == io.EOF { fmt.Println(string(p[:n])) break } fmt.Println(err) os.Exit(1) } fmt.Println(string(p[:n])) } }
In this code, we make a buffer of size 4. We iteratively call the reader.read
function, unless it returns an error. We check the error, if it is io.EOF
, we print any bytes that the function might have returned (using n
, number of bytes read by the call) else we print the error and exit.
If no error, we print the number of bytes read using the buffer and move forward to the next iteration.
This is how we read Data Streams in Golang. We will discuss writers
in the next article. You can read more Golang related articles here.
0 Comments