Threading
The Crash Course
- Threading allows you to run sections of code simultaneously.
- Starting a new thread is relatively simple in C#: Thread thread = new Thread(MethodNameHere); thread.Start();
- The method you use must have a return type of void and must not take any parameters. If you want to return information from a thread or pass in parameters, you typically create an object with that information as instance variables or properties and a single method with a return type of void and no parameters that the thread will run in.
- You can wait for a thread to finish with the Join method: thread.Join();
- If you need to worry about thread safety (preventing problems when multiple threads are modifying the same data), you can use the lock keyword: lock(aPrivateObject) { /* code in here is thread safe */ }. Typically, you create a private instance variable if the areas that you need to be thread safe are all within the same class or a private static variable if it needs to be thread safe between instances of the class.
Introduction
Back in the day, all computers had a processor or CPU, which is the brain of the computer that runs all of the instructions. Nowadays, they usually have many. (Or, more accurately, they have a single chip with multiple cores that each function largely independent of the other cores and can run instructions independently.) I'm currently working on a computer with four processors, each of which is hyperthreaded, making it appear that I have a total of eight processors. This computer isn't even a particularly fancy computer, either. There are machines out there with far more cores.
The point is that computers have a lot of power, but when we run our programs, unless you make it happen, you will only be running on a single processor. Think of all of that raw computing power going to waste!
In this tutorial, we're going to take a look at threading. The basic process with threading is that we can take chunks of code that are independent of each other and make them run in separate threads. A thread is almost like its own program in that the computer runs multiple threads simultaneously on the different processors that your computer has. (For the record, a thread is not its own program. It still shares memory with the program/process that created it.)
When many threads are running, the computer will let one thread (or a few, depending on the number of processors you have on your computer) run for a little while, and then it will suddenly switch it out for a different one. The computer gets to decide when it is time to make the switch and which thread to switch to, but it does a pretty good job, so we don't need to micromanage it.
The important thing to know is that multiple threads can run at the same time, and they get switched out at the whim of the Evil Overlord (err… operating system). All threads will be treated fairly equally, but they will get switched around from time to time. This random switching causes a few problems in some cases, and we'll talk about that in the section about Thread Safety.
There's a lot that goes into threading, and we simply don't have enough time to discuss all of the ins and outs of it here. So we won't. Instead, we'll take a look at the basics of threading, and I'll allow you to dig further into threading as you need it.
The Task
Threading can be used for many, many tasks. There's no real limitation on what you can or cannot use threads for, but there are two broad categories that threading works especially well for. One, running work "in the background." This means that stuff is happening, but you can continue to update the GUI at the exact same time. You see this all the time when there's a progress bar indicating how much work has been done or how much work is left.
The second broad category is when you have a very large pile of work to do, and it can be broken down into parts that are independent of each other. In this case, multiple threads can take the different parts and work on them without stepping on the toes of the other threads.
While there's no limit to what you can do with threads, we're going to stick with a relatively simple example from this second group. Imagine that we have a program that has an array of ints, and we want to fill it with random numbers between 1 and 100. Uh… wait… let's go with 0 to 99 for simplicity. (Since C# loves 0-based indexing.)
Without threading, this program might look something like this:
int[] numbers = new int[10000]; Random random = new Random(); for (int index = 0; index < numbers.Length; index++) { numbers[index] = random.Next(100); }
In this tutorial, we're going to refactor (a fancy programming word for "rearrange" or "reorganize") this code to use threads. If you take a close look at this, you'll see that while we have a very large list of numbers to fill up, each and every number can be determined (randomly) on its own—completely independent of the number before or after it. Because of this, creating multiple threads and giving each of them their own block of the numbers array will be easy to work on.
Pulling Out the Work to be Threaded
Our first step will be to pull the work that we want to be threaded out into a place where we can assign it to a separate thread. We'll see in a minute that we start a thread by giving it a method to start working in, but that method can't take any parameters or return anything. So if we want to give our thread parameters (the actual stuff to work on) or have it return any results, our best approach is to create a class to store the work and the results in. We'll give this class a method that belongs to the class with no parameters and no return value and call that method.
Before jumping ahead, take a minute and think about what this class might look like. What kind of properties or instance variables should it have? What will the method look like that our thread will work?
Below is the version of this class that I've created:
public class GenerateNumbersTask { // We do this so that we only have one instance of the Random class // in use, no matter how many tasks we create. private static Random random = new Random(); public int[] Numbers { get; set; } public int StartIndex { get; set; } public int Count { get; set; } public void GenerateNumbers() { for (int index = StartIndex; index < StartIndex + Count; index++) { Numbers[index] = random.Next(100); } } }
I have three properties here. Numbers contains a reference to the shared array where the random numbers will go. StartIndex is the spot where the current thread will start filling the array, and Count will store the number of items this thread is responsible for.
For example, we might create 10 threads, giving each of them 1000 numbers to generate. They'd all have a reference to the same Numbers array, and since they all happen to be doing the same amount (1000), the Count will be the same. Our first thread will get a StartIndex of 0, our second thread will get a StartIndex of 1000, our third thread will get a StartIndex of 2000, and so on.
Finally, we have the GenerateNumbers method. In this method, we actually do the work of generating random numbers, working only in the range that we've told the thread to work with. Notice that this method doesn't return anything (void) and has no parameters, so we'll be able to use it to start our thread in a minute.
Starting the Thread
The basic approach to starting a new thread is very easy, so I'm going to start there before showing you the code we'll use to complete the example we're following in this tutorial.
Basically, you'll need these three lines, in some fashion or another:
Thread thread = new Thread(MethodNameHere); thread.Start(); thread.Join();
The first line creates a new thread with the name of a method to run when the thread begins running. This is a delegate, which is something we covered in earlier tutorials. There are two overloads of the Thread constructor. One demands a method with no parameters, while the other demands a method with a single object-typed parameter. Both demand that the method is a void method. We'll use the first here, but the second also has plenty of uses.
The second line starts the thread going on the method it was given. The form shown above is the right flavor if the supplied method has no parameters, but if you used one with a single object-typed parameter, then you can supply the argument for the method via thread.Start(argumentHere);.
The final line tells the launching thread (in this case, the program's main thread) to stop whatever else it might have been doing and wait for the thread to finish its work before continuing. Between the Start and the Join calls, you can put other work for the original thread to do in the meantime.
Let's apply this
OK, here's my code:
int[] numbers = new int[10000]; // Prepare the list of threads and the list of tasks for those threads to work on. int threadCount = 10; GenerateNumbersTask[] tasks = new GenerateNumbersTask[threadCount]; Thread[] threads = new Thread[threadCount]; for (int index = 0; index < threadCount; index++) { // Create the task, and set it up with the right properties--these are our // parameters. tasks[index] = new GenerateNumbersTask() { Numbers = numbers, Count = 1000, StartIndex = index * 1000 }; // Create the thread threads[index] = new Thread(tasks[index].GenerateNumbers); // Start the thread running threads[index].Start(); } // Wait around and join up with all of the threads, so that when we // move on, we're sure all of the work has been done. for (int index = 0; index < threadCount; index++) { threads[index].Join(); }
Thread Safety
In the example above, all of our threads were working with different chunks of memory, completely independent of the others. This is usually how you want to set up multi-threaded work. Things work great when each thread works in its own space on its own stuff.
But the second two threads need to access the same bit of data, we open ourselves up to a whole lot of potential, hard-to-detect problems.
For example, imagine you have a game with thousands of objects that all need to be updated and your strategy is to have a handful of threads work through them all to update them. They all live in an array, and your strategy is to have a variable called nextObject somewhere for threads to look at to see which object to work on next. After seeing which object number to update next, the threads should increment nextObject so that the next thread will check the following object.
This is a mostly viable strategy. One big upside compared to our strategy for generating ten thousand random numbers is that all the threads will share the load, and if one thread gets hung up updating a slow object, the other threads can simply take care of the rest.
But there's a catch. Seeing which object number is next and updating it is a multi-step process. We have to read a value out of the nextObject variable, add one to it, then store the new number back into nextObject. (That's all still true even if you do nextObject++;.
In a single-threaded environment, there are no problems. But in a multi-threaded environment, it is possible to have Thread #1 read the value in nextObject. Let's say the current value is 4. At this point, Thread #1 assumes it will update the object at index 4. But then the operating system's scheduler pauses the thread. Thread #2 comes in and reads the current value in nextObject. It is still 4! Thread #2 says, "I'll go update the object at index #4, but now I'll increment that value and store a 5 back in nextObject for other threads." Before long, Thread #1 gets its next turn to run and still thinks it is supposed to update the object at index #4, and says, "I already knew I was doing #4, but now I'll increment that to 5 and store that back in nextObject for other threads."
At this point, both threads will end up updating the same object, which can cause all sorts of subtle and even some not-so-subtle bugs. The worst part? You'll have a very hard time repeating it because it isn't deterministic. It doesn't happen the same every time.
Any time you have threads sharing access to data, you run risks of problems like this. You must do extra work to ensure it doesn't cause problems. There are quite a few ways to handle this (a simple one we won't cover is the Interlocked.Increment method), but we'll focus on the most general-purpose solution: locks.
The real problem we have is that there is some chunk of code called a critical section that must only have one thread in it at a time. Like locking a public bathroom stall (potty analogy for the win!), we want to ensure that we can get in, do our job, and get out without others getting in. The other threads should wait their turn instead.
In our case, the critical section is likely literally just copying the value out of a variable and into another and incrementing it, so it is quite short. Short critical sections are better than long ones since you destroy any benefits you were hoping to get with multiple threads if they get too big.
Locking a critical section requires having an object that is used in the lock. This object doesn't need to be anything special and is one of the rare cases where it is common to make a new instance of the object class:
private object _threadLock = new object();
You usually want lock objects to be private because if you make them public, they could end up being reused in other locks, which isn't usually desirable.
Then to lock the critical section, you use a lock statement like this:
public int GetNextObjectNumber() { int nextNumber; lock (threadLock) { nextNumber = nextObject; nextObject++; } return nextNumber; }
If all threads call this method to figure out which object to update next, then, in the event that two threads would have been running in that method simultaneously, the lock will ensure one picks a number and runs that code to completion before the next thread even starts, ensuring we don't have the problem described earlier.
What's Next?
Threading is a very big and very tricky topic. It is important to be aware of it, but it is not something you master overnight, nor is it something you should take on lightly. You should use only a single thread if you can manage it. But some problems just need more than one thread, and it is good to have the option in those situations.
Our study of C# now continues to a few final, relatively simple concepts that don't particularly have anything to do with each other. We'll talk about operator overloading next.
I followed your example, made two classes, Program.cs (auto generated) and GenerateNumbersTask.cs, but I get a red squiggly line on "lock(threadLock)" with the following messages:
Invalid token 'lock' in class, struct, or interface member declaration.
The name 'threadLock' does not exist in the current context.
namespace Threading
{
public class GenerateNumbersTask
{
// We do this so that we have only one instance of the Random class
// in use, no matter how many tasks we create.
private static Random random = new Random();
// private instance variable to lock on for safe threading,
// preventing OS to switch threads while the current thread is still runing
private object threadLock = new object();
public int[] Numbers { get; set; }
public int StartIndex { get; set; }
public int Count { get; set; }
lock(threadLock)
{
public void GenerateNumbers()
{
Console.WriteLine("New Thread");
for (int index = StartIndex; index < StartIndex + Count; index++)
{
Numbers[index] = random.Next(100);
Console.Write(Numbers[index] + " ");
}
Console.WriteLine();
}
}
}
What do you think might be the cause?
It looks like you've wrapped the method in a lock statement. The statement has to go inside of the method. So rather than:
It should look more like:
Doesn't your code need to assume thread-safety of the Random class to guarantee working correctly, since you use a static object?
Post preview:
Close preview