Threading in C#

Threading

The Crash Course

  • Threading allows you to run sections of code simultaneously.
  • Starting a new thread is relatively simple in C#: Thread thread = new Thread(MethodNameHere); thread.Start();
  • The method you use must have a return type of void and must not take any parameters. If you want to return information from a thread, or pass in parameters, you typically create an object with that information as instance variables or properties, and a single method with a return type of void and no parameters that the thread will run in.
  • You can wait for a thread to finish with the Join method: thread.Join();
  • If you need to worry about thread safety (preventing problems when multiple threads are modifying the same data), you can use the lock keyword: lock(aPrivateObject) { /* code in here is thread safe */ }. Typically, you create a private instance variable if the areas that you need to be thread safe are all within the same class, or a private static variable if it needs to be thread safe between instances of the class.

Introduction

Back in the day, all computers had a processor, or CPU. (If you didn't already know, this is the "brain" of the computer, which does all the work.) Nowadays, they usually have many. I'm currently working on a computer with two processors, each of which are "hyperthreaded", making it appear that I have a total of four processors. This computer isn't even a particularly fancy computer, either. There are machines out there that have 8 or 16 processors (and there are ways to get other computers to do work for you, meaning you could have an unlimited number of processors at your disposal).

Anyway, the point is, computers have a lot of power, but when we run our programs, unless you make it happen, you will only be running on a single processor. Think of all of that raw computing power going to waste!

In this tutorial, we're going to take a look at threading. The basic process with threading is that we can take chunks of code that are independent of each other and make them run in separate "threads". A thread is almost like its own program, in that the computer runs multiple threads all at the same time, on the different processors that your computer has. (For the record, a thread is not its own program. It still shares memory with the program/process that created it.)

When many threads are running, the computer will let one (or a few, depending on the number of processors you have on your computer) thread run for a little while, and then it will suddenly switch it out for a different one. The computer gets to decide when it is time to make the switch, and which thread to switch to, but it does a pretty good job, so that's one less thing we need to worry about.

The important thing to know is that multiple threads can run at the same time, and they get switched out at the whim of the Evil Overlord (err… operating system). All threads will be treated fairly equally, but they will get switched around from time to time. This switching randomly, by the way, causes a few problems in some cases, and we'll talk about that in the section about Thread Safety.

There's a lot that goes into threading, and we simply don't have enough time to discuss all of the ins and outs of it here. So we won't. Instead, we'll take a look at the basics of threading, and I'll allow you to dig further into threading as you need it.

The Task

Threading can be used for many, many tasks. There's no real limitation on what you can or cannot use threads for, but there are two broad categories that threading works especially well for. One, running work "in the background". This means that stuff is happening, but you can continue to update the GUI at the exact same time. You see this all the time when there's a progress bar indicating how much work has been done, or how much work is left.

The second broad category is when you have a very large pile of work to do, and it can be broken down into parts that are independent of each other. In this case, multiple threads can take the different parts and work on them without stepping on the toes of the other threads.

While there's no limit to what you can do with threads, we're going to stick with a relatively simple example from this second group. Imagine that we have a program that has an array of ints, and we want to fill it with random numbers between 1 and 100. Uh… wait… let's go with 0 to 99, for simplicity. (Since C# loves 0-based indexing.)

Without threading, this program might look something like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace Threading
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] numbers = new int[10000];
 
            Random random = new Random();
 
            for (int index = 0; index < numbers.Length; index++)
            {
                numbers[index] = random.Next(100);
            }
        }
    }
}

In this tutorial, we're going to "refactor" (a fancy programming word for "rearrange" or "reorganize") this code to use threads. If you take a close look at this, you'll see that while we have a very large list of numbers to fill up, each and every number can be determined (randomly) on its own—completely independent of the number before or after it. Because of this, it will be easy to create multiple threads, and give each of them their own block of the numbers array to work on.

Pulling Out the Work to be Threaded

Our first step will be to pull the work that we want threaded out into a place where we can assign it to a separate thread. We'll see in a minute that we start a thread by giving it a method to start working in, but essentially, that method can't take any parameters or return anything. So if we want to give our thread parameters (the actual stuff to work on) or have it return any results, our best approach is to create a class to store the work and the results in, and have a method that belongs to the class with no parameters and no return value, and call that method.

Before jumping ahead, take a minute and think about what this class might look like. What kind of properties or instance variables should it have? What will the method look like that our thread will work in?

Below is the version of this class that I've created:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace Threading
{
    public class GenerateNumbersTask
    {
        // We do this so that we only have one instance of the Random class
        // in use, no matter how many tasks we create.
        private static Random random = new Random();
 
        public int[] Numbers { get; set; }
        public int StartIndex { get; set; }
        public int Count { get; set; }
 
        public void GenerateNumbers()
        {
 
            for (int index = StartIndex; index < StartIndex + Count; index++)
            {
                Numbers[index] = random.Next(100);
            }
        }
    }
}

I have three properties here. Numbers is going to ultimately contain a reference to the array that we are trying to fill with random numbers. Then, we'll need each thread to know what parts of the array it should work on, so we create a StartIndex, which is the index in the array that the thread should start working at, and Count, which indicates how many numbers to do.

As an example, we might create 10 threads, and give each of them 1000 numbers to generate. They'd all have a reference to the same Numbers array, and since they all happen to be doing the same amount (1000), the Count will be the same. Our first thread will get a StartIndex of 0, our second thread will get a StartIndex of 1000, our third thread will get a StartIndex of 2000, and so on.

Finally, we have the GenerateNumbers method. In this method, we actually do the work of generating random numbers, working only in the range that we've told the thread to work with. Notice that this method doesn't return anything (void) and has no parameters, so we'll be able to use it to start our thread in a minute.

Starting the Thread

The threading code that is built-in to the .NET framework and C# is in the System.Threading namespace, so in order to use it, we'll have to add an additional using statement at the top of our code with the rest of the using statements:

using System.Threading;

The basic approach to starting a new thread is very easy, so I'm going to start there, before showing you the code that we'll use to complete the example that we're following in this tutorial.

Basically, you'll need these three lines, in some fashion or another:

Thread thread = new Thread(MethodNameHere);
thread.Start();
thread.Join();

In the first line, we would create the new thread. In the Thread constructor, we pass in a method name that matches the needed delegate (returns void, and has no parameters, like we've discussed).

In the second line, we kick off the new thread, which will start running the method we pointed out in the first line.

Finally, the Join method means that the main thread will wait until the worker thread finishes before continuing on. Without this, the main thread will continue doing whatever is next in the code. Now, I'd better clarify—that's not necessarily a bad thing. In fact, it may be exactly what you want. But then you won't know when the thread finishes, unless you set up an event somewhere for the worker thread to raise when it is done. Like usual, there are lots of ways to write the code to make it work like you want.

Alright. With all of that in place, let's return to our example, and finish it up. I'm going to do something a little crazy here, so I'll explain it so it makes some sort of sense. Since we're wanting so many threads (we'll go with 10, like we've said earlier), and I want to be able to keep track of them all, I'm going to create two arrays of length 10. One will store a list of the GenerateNumbersTask that we created earlier, and one of them will store a list of the Thread objects we create.

Once they're created, we'll loop through each of them, preparing a new GenerateNumbersTask with the right settings, and start a new thread on it. When we're done with that, we'll join up with each of the threads in turn, making sure that all of the work is done before continuing on.

OK, here's my code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
 
namespace Threading
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] numbers = new int[10000];
 
            // Prepare the list of threads and the list of tasks for those threads to work on.
            int threadCount = 10;
            GenerateNumbersTask[] tasks = new GenerateNumbersTask[threadCount];
            Thread[] threads = new Thread[threadCount];
 
            for (int index = 0; index < threadCount; index++)
            {
                // Create the task, and set it up with the right properties--these are our
                // parameters.
                tasks[index] = new GenerateNumbersTask()
                {
                    Numbers = numbers,
                    Count = 1000,
                    StartIndex = index * 1000
                };
 
                // Create the thread
                threads[index] = new Thread(tasks[index].GenerateNumbers);
 
                // Start the thread running
                threads[index].Start();
            }
 
            // Wait around and join up with all of the threads, so that when we 
            // move on, we're sure all of the work has been done.
            for (int index = 0; index < threadCount; index++)
            {
                threads[index].Join();
            }
        }
    }
}

Thread Safety

We've already covered the basics of threading, but I think, now, it is worth looking at another, more advanced topic with threads. If you've had enough of threads, you'll be fine to continue on, but I think the stuff I'm about to show you is worth knowing about.

In the example that we just did, each thread had its own chunk of work to do. There was no overlap at all. But what if there was something that they had to share? Remember that threads get swapped out whenever the operating system decides, and a thread may actually be in the middle of something important.

Imagine a scenario where you have hundreds of objects in a game (computer controlled players and objects) that all need to be updated. These objects are stored in an array, and there's an index into the array indicating the next object to update.

This work could easily be separated onto different threads, with each thread updating some of the objects. Since some objects will take longer than others to update, we take an approach where the various worker threads will update objects, one at a time, until they've all been updated. When a worker thread is sitting around with nothing to do, it calls a method to figure out what to work on next. Inside of the method, the thread can grab a reference to the next object to work in, and move the index that shows what object to work on next, on to the next one, so that the next time a (probably different) thread asks, it will get assigned the next item in the list.

But there's a problem here. What if a thread goes to get more work, gets a reference to the next object to work on (the one at, say, index 0), but before it has a chance to actually advance the index to the next item (so setting the index to 1), the operating system decides it's time for a different thread to run, and pauses the thread in its tracks. The second thread now goes in and figures out which object to update, but since the index still hasn't been updated (it is still 0), it grabs the exact same object as the first thread. That object (the one at index 0) will get updated twice!

But it gets worse. The second thread now changes the index (so index == 1 now), so that when a third thread asks for work to do, it is in the right place. The second thread then carries on, updating its object. But in the meantime, there's another switch, and the first thread becomes active again. It already knows what object to update (the object at index 0, which thread #2 is also updating), and it moves on and changes the index again (now to 2), skipping that next object (the object at index 1) entirely! One gets updated twice, and one gets skipped.

The underlying problem here is that we have two threads trying to access or modify the same data, without checking to be sure that it is safe to do so. There's a way to fix this, though, so that you can mark of sections of code and make sure that only one thread can go inside it at a time. This is called "thread safety". If a second thread gets to the section while another thread is already inside it, it has to wait until the other thread has left the section. On one hand, this can slow your program down. But on the other hand, your data will be consistent. So you don't want to make code thread safe that doesn't need it (or I guess, code that is just inherently thread safe, because multiple threads won't mess each other up), but when it is needed, then it is important to make it thread safe.

Anyway, on to the part where we discuss how to actually make your code thread safe. To do this, we need two pieces of code. When a thread is about to enter a section of code that needs to be thread safe, we'll use the lock keyword. But along with the lock keyword, we'll need to provide an object to lock on. The best approach to this is to lock on a private instance variable, so you may want to add one to the main thread controller class where the method that needs to be thread safe is:

private object threadLock = new object();

You can also create this variable as a static variable instead of an instance variable if you need thread safety across all instances of the class:

private static object threadLock = new object();

Needless to say, you can call your variable anything you want. You don't need to use theadLock if you don't want to.

Then, to actually make a block of code thread safe, using the threadLock object, you'd add code that looks something like this:

lock(threadLock)
{
    // Code in here is now thread safe.
    // Only one thread at a time can be in here.
    // Everyone else will have to wait at the
    // "lock" statement.
}

What's Next?

Threading can be a complex thing to tackle. Many programs don't need it at all (or a framework that you're using does it behind the scenes, like XNA, WinForms, or WPF) so you won't need to worry about it. But when the need for it comes up, it is helpful to know that it exists as an option, and the basics of how to do it.

Our study of C# now continues to a few final, relatively simple concepts that don't particularly have anything to do with each other. We'll talk about operator overloading next.