Asynchrony, Concurrency, and Parallelism
With how often these words are used, async likely being the most common of the three, it's common to get confused on what their real meanings are. It's also usually unclear how async relates to multithreading, or whether it does at all, so let's talk about it!
With how often these words are used, async likely being the most common of the three, it's common to get confused on what their real meanings are. It's also usually unclear how async relates to multithreading, or whether it does at all, so let's talk about it!
Prologue
There is a pressing need to make one thing clear — threads are not a hardware concept. They exist purely in the kernel. On the hardware level, you only have CPU cores. CPU cores contain the actual chips that are able to perform calculations and run instructions, whereas threads are like an isolated set of instructions that are needed to be run (along with data from when they were last preempted/paused so that they can be safely resumed).
Although it is an oversimplification, you can think of threads like an array of instructions that need to be run, something like ["const a = 5;", "a += 1;", "a /= 5;"], but in machine code.
Threads need to be run on a CPU core for them to proceed toward completion. For simplicity, let’s think of it in terms of percentages. Each thread has a progress (like 20% complete) and state from when it was last paused. Threads can be executing instructions on the execution units, or waiting for I/O or memory fetches to complete.
Since it is possible to have more threads than the number of CPU cores, the OS needs to do some “thread management”, so to speak, in order to make sure they all progress and none of them is stuck. This is done via scheduling.
Scheduling refers to passing control around threads when required in a way such that none of the threads are starved/stuck. Let’s talk more about this later, and for now proceed to the main topics of this blog.
Async

Async, Asynchronicity, or Asynchrony, is inherently a simple concept. All it means is that you do not necessarily have to wait for one action or line of code to complete executing before moving on to the next task/action/line.
The most common example of this, and one that we as developers likely use everyday, is waiting “asynchronously” for a network request, file read, or any other kind of I/O. This lets other things, such as rendering of the UI for example, happen while you wait for the result.
Now for the question of whether or not it is multi-threaded — it depends. Let’s talk about syscalls for a little bit to understand this. Syscalls are a way for your program to talk to the kernel and ask for something, for example, the contents of a file on the disk. Syscalls are required because there are several system resources that programs are unable to directly interact with because they are not allowed to. There exist blocking and non-blocking syscalls, where blocking syscalls block the thread until the result is ready. When this happens, the thread does not get any CPU time until the blocking syscall finishes. With non-blocking IO, the syscall returns immediately without blocking the thread. The IO is still in progress, and the kernel takes care of that. In the context of NodeJS, for example, in the poll phase of the event loop, it checks if any of the IO it started has finished, and if it has, calls the respective callback. For any IO that has not finished, it continues progressing without blocking the thread.
In most cases, if a program can use non-blocking syscalls to do something, that is what it will do. NodeJS is able to use libuv to spawn worker threads to perform operations that do not have system level non-blocking syscalls, such as DNS queries.
Non-blocking syscalls are preferred over blocking syscalls because spawning new threads for every file read or network wait, for example, can quickly result in a lot of threads being spawned, which as we’ll see later on, is not good for performance.
Concurrency

Concurrency refers to a situation where two or more things are in progress, but not necessarily progressing at the same point in time. Sounds familiar? This is because we did this in Asynchrony. When we proceed to Thing 2 while waiting for Thing 1 to finish, we are indeed making progress in two different tasks. This means that asynchrony implies concurrency. Simple as that!
The difference between asynchrony and concurrency is that with asynchrony, the reason you might do two things at a time is because one of them is performing a slow operation which you do not want to block the thread for, whereas concurrency is about doing multiple things at once because that’s inherently the objective.
Concurrency does not necessarily mean that multiple tasks have to be making progress at the exact same moment in time. They can make progress whenever possible. This means that time slicing, or progressing in turns, is totally fine and still implies concurrency. The only condition is that more than 1 task has to be in progress at a given point in time.
Parallelism

This is where things get tricky. Parallelism is a situation where multiple tasks are making progress at the same point in time. This is possible when different CPU cores are executing different tasks together. Each CPU core has it’s own chips for calculation/execution, so each CPU core can make progress independently.
Before we continue, let’s talk a little more about threads. When you launch a program, it is given a thread. The program also has the capability to ask for more threads. The maximum number of threads your system can run is usually way higher than the number of cores you have — for example, it’s 250,000 for me, and I have a 16 core CPU. The only way to run this many threads on a meagre amount of actual cores is, as you might have guessed, sharing the resources. This is done by the scheduler, which is a part of the OS kernel. The scheduler makes sure that all the existing threads get a chance to use the CPU. Imagine a queue of people at a store with only one booth — the actual booth is just one, but it can service multiple people one-by-one. Scheduling is something like that, except the booths are as many as the CPU cores you have, and the decision of whose turn it is next is dependent upon a much bigger variety of factors. Once a thread has exhausted it’s turn, context switching occurs which saves the current progress in the thread, and gives up control to the next thread.
You might have heard words like “UI thread” or “Background thread”, but this doesn’t mean those threads are specially made for UI or anything like that. It might just be called the UI thread because that’s where some program updates it’s UI. Threads are all structurally identical, except for “metadata” like priority.
Getting back to the topic, when you create a thread and run some operation on it, given that your programming language does not use green threads (which we’ll talk about in a bit), which core of the CPU the thread is going to run on depends on the scheduler (unless you manually set the thread affinity). This is decided based on a variety of conditions, like how long the thread has been running for, what the priority of the thread is, and so on. The finer details of scheduling and affinity are very complex and out of scope for this blog — they are very well researched topics with decades of development. It also depends on the type of the processor — for example, Intel recently released a performance-efficiency core based model on their newest CPUs; in this case, only some of the cores are suitable for CPU intensive tasks, and are called performance cores, whereas the rest of the cores are called efficiency cores and are meant to have background tasks with low system load on them. The scheduler has to make active decisions on which thread runs on which core based on the type of work they do.
Linux uses CFS (Completely Fair Scheduler) by default, which tries to give each thread a fair chance to finish by giving them turns of bursts of time to use the CPU cores. After that, it performs what’s known as “context switching”, where the current state of the registers is saved in the thread so that it can resume when it gets a turn next, and the next thread (calculated based on several factors) is given a chance to use the CPU. Context switching is a relatively expensive operation, since the state of all the registers has to be stored, and when the next thread is determined, the registers have to be populated with the values from when that thread last ran.
Hypothetically, you could have an extremely large number of threads since the kernel is able to time-slice and context switch and give each of them some time with the CPU, but this would not let any of the threads make any progress, because each of them would have so little time to progress it would take an eternity for any of them to finish. Hence, OS threads are not scalable, and operating systems usually have an upper limit for thread count.
This is where green threads come in! One example of a popular programming language that uses green threads is Golang, and it calls them goroutines.

Goroutines are like threads, and run on OS threads. The Go runtime creates one OS thread for every CPU core you have, although the OS is probably running hundreds or thousands of threads already, created by other running programs.
Since you are allowed to have more goroutines than the number of OS threads the runtime creates, the Go runtime also comes with a scheduler! This scheduler is built into the Go runtime, and handles which OS thread runs which goroutine and for how long. What OS threads are to CPU cores, Goroutines are to OS threads (as OS threads are mapped onto CPU cores, and goroutines are mapped onto OS threads).
Summing it up, when you create multiple goroutines, the go scheduler schedules them on OS threads, and the kernel schedules those OS threads on the CPU cores where the code is finally run.
There are three benefits to this approach:
- The Go runtime knows what is going on in your program. It has access to the call stack and information about the execution state, which allows it to make more informed scheduling decisions.
- A simple language-native communication method between goroutines, which we know as channels, is made possible. The inner workings of channels are out of scope for this blog.
- The ability to create millions of goroutines worry free, since goroutines are faster to start, and switching to another goroutine is way cheaper than OS thread context switching.
Talking about threads, you might have noticed CPUs advertising they have N cores and N * 2 threads. Something like “6 cores 12 threads”. But since threads are a software concept, why does a processor, something that is purely hardware, advertise this? This is probably where most of the confusion about threads being hardware comes from.
The reason why CPUs are able to advertise thread count as double the core count is thanks to Simultaneous Multithreading (SMT), which Intel calls Hyperthreading. Modern processors are very powerful, and one thread is often unable to satiate all the resources on a core, since instructions might have dependency chains, or experience high memory latency. Hence, modern CPUs expose two “logical” cores per physical core to the operating system. This allows hungry execution units to be fed with instructions at all times, and improves efficiency of resources. Since one physical core is represented as two logical cores in the OS, the scheduler can run two separate threads on these two logical cores, which ends up with, for example, 16 cores being able to run 32 threads.
Anyway, we got a bit sidetracked. In summary, asynchrony is being able to perform a task while waiting for another task to finish performing a slow operation like I/O, concurrency is about doing multiple things at a given point of time with that being the inherent goal, and parallelism is about doing multiple things at the exact same point of time. Asynchrony implies concurrency, but not necessarily parallelism (JS is able to achieve asynchrony and concurrency without parallelism thanks to the event loop). Concurrency might imply asynchrony (doing two things at once may be because that was the inherent goal, or because we moved on while waiting for a slow operation to finish), but not necessarily parallelism (concurrency might be due to things running on different cores or via time slicing). Parallelism implies concurrency, but not necessarily asynchrony (again, tasks might simply be performing computations and not necessarily waiting on something to happen on different cores).
And there you have it! Hopefully you now know a bit more about what Asynchrony, Concurrency, and Parallelism mean, along with a little bit about what threads are and how they work. Note that there are several topics mentioned here that are very complex, especially scheduling, and what I’ve written about them is an oversimplification at best. There are always many, many more details to dive into, should you wish to.
Until next time!