Concurrency

Aug 20, 2020

bozhin-karaivanov

Concurrency refers to multiple computations happening at the same time. It is a form of parallel computation. Many programs we use today have some sort of concurrent process, even our operating system. For instance, how does the operating system handle users running multiple programs at the same time? (Concurrency!)

Imagine that you have a surprise presentation tomorrow and you have just started preparing. Unfortunately, it just happened that today you have a DMV appointment! California is notorious for having a long wait times, so you decide to bring your laptop with you to the DMV office. You start working on your presentation after checking in and pause when you’re called for your appointment. After your appointment is done, you’re able to go home and rest early since you finished the majority of your presentation at the DMV office.

So, what does the mini-story tell us? Suppose the character you play represents a computer core which carries out computations. The core has two distinctive tasks:

  1. Attend the DMV appointment
  2. Finish preparing for the surprise presentation

You could have done these two tasks separately – instead of starting preparation at the DMV office, you start after your appointment. You are, however, very smart and realize that it’s not an efficient way to complete tasks. It’s the exact same way with programs! Whittling away at tasks via small amounts at the same time can make computational processes finish faster.

Concurrent Models

There are two main concurrent models used in modern programming:

  1. Shared Memory
  2. Channels

Shared Memory

Imagine this scenario: You have a global string variable in your program. You then record the length of the string inside an integer. After, you construct a concurrent loop with three iterations which subtracts 1 from the length and replaces the original value inside the integer. The program is instructed to print the integer value after the loop. You expect your function output to be 5 when you run it, but you get 7 instead.

So, what went wrong?

What could’ve happened in our example was that since the loop doesn’t stop in respect to the calculations, only one calculation makes it in time to be replaced in the integer value. Or, what if two threads trying to change the integer conflict with each other? For instance, both threads read the number 8, so both threads replace the old number with 7 when in reality, we want 6. This is a good representation of a shared memory state. This example makes it easier to see that we need some sort of synchronization system to stop concurrent processes from overwriting each other.

The synchronization system is achieved through mutual exclusion locks, also commonly known as mutexes. Imagine that each thread in a concurrent program needs to be able to access a “mutex lock” in order to perform an operation on a shared resource. In other words, only the person who has the hot potato can access the resource in a game of hot potato.

The benefit in using this system is that we can have multiple threads trying to perform computations on the same memory resource without any worry of corruption. We know this because the mutex will only allow one current thread to perform updates on the resource at a time.

There are two different types of locks:

  • Read Lock
  • Read & Write Lock

Multiple threads can hold a read lock to read data from the resource at the same time. Only one thread can hold a write lock at a time.

If there are currently threads holding read locks and there’s a write lock queued, the write lock will wait for all reads to finish before blocking. If a write lock is blocking, no other reads or writes can access the resource – they must wait until the write unblocks.

Read this post for an in-depth example on how to use mutexes in Go.

Channels

Channels are a different type of concurrency management. Imagine a channel as a pipe receiving information from a concurrent routine. You can also send a message through this pipe from one concurrent routine to another.

I admittedly do not know as much examples for channels because they don’t match my use cases. I can, however, mention that channels are good solutions for problems in which you’re waiting on multiple concurrent processes.

For example, consider the following Go program:

package main

import (
    "fmt"
    "time"
    "math/rand"
)

func main() {

    // initialize some channels
    channelOne := make(chan int, 1)
    channelTwo := make(chan int, 1)

    // goroutine 1
    go func() {
        time.Sleep(time.Duration(rand.Intn(2)) * time.Second)
        channelOne <- 1
    }()

    // goroutine 2
    go func() {
        time.Sleep(time.Duration(rand.Intn(2)) * time.Second)
        channelTwo <- 2
    }()

    // wait until we receive a value from either goroutines
    select {
    case num := <-channelOne:
        fmt.Printf("Num: %d", num)
    case num := <-channelTwo:
        fmt.Printf("Num: %d", num)
    }

}

In the code example, we create two goroutines purposely set to sleep for a random amount of time within 2 seconds. The select statement at the end of the function waits for either of the goroutines to return an integer value and prints it. Feel free to play with this example here.

All in all, channel type implementations via select options are a good way to simulate pub-sub design patterns in web servers.

Race Conditions

The last important topic to know about concurrency is race conditions. A race condition is essentially an error in synchronization which causes your program to fail. The errors may or may not cause your program to crash and can be very hard to distinguish. For this reason, concurrent programs are quite hard to debug!

Fortunately most modern languages, including Go, have utility tools to catch potential errors. These tools are not 100% fail-proof (can catch data races but not all race conditions), so I recommend taking precautions whenever possible.

Go’s race detection occurs dynamically at runtime. This process may occur during compile-time for other languages such as Rust.

A classic example to show data races include the joint bank account problem. Suppose two people have access to the same bank account and there was no mutex implemented. What would happen when two people accessed the account at the same time? Logically, nothing would happen if both individuals were just checking the balance. What if both individuals were depositing and withdrawing money at the same time though? So say, the initial balance in the account is $100.

One person withdraws $50.

The other deposits $30.

The bank sees this:

  1. Person A checks balance ($100)
  2. Person B checks balance ($100)
  3. Person A withdraws $50
  4. Person B adds $30
  5. A reports result (balance = $50)
  6. B reports result (balance = $130)

We can clearly see an error here. We have a large problem merging our two results, and sometimes, merging doesn’t even make sense. This situation can be easily solved by using mutex locks for safe read and write operations.

Performance

While concurrency is a much loved feature, there are times when you shouldn’t use it – namely when the cost of creating threadsafe implementations doesn’t justify its means. For example, I used Java streams to filter, transform, and parse 11,000 lines of Twitter data. I implemented two versions: an iterative and a concurrent version. Granted benchmark testing will give varied results depending on machine specifications, however, I ran both benchmarks tests under the same conditions on my laptop. I found that even through multiple tests, the iterative version ran faster.

Why?

The simple answer is because creating threads and making sure all processes are thread-safe is an expensive operation. It’s just not worth the time and energy to create concurrent processes for small amounts of computations because the CPU is already able to handle them efficiently at that scale.

In other words, don’t use concurrency to write a loop that prints out integer numbers from 1 to 10!

Resources

Still interested in more information? I find the following resources good reads:

Yiping Su
Yiping Su
Software Engineer

I am interested in data, software engineering, and the application of computer science concepts in real-world scenarios.

Related