What New Programmers Need To Know About the Stack and Heap
In many modern programming languages, we don’t tend to spend a lot of time thinking about memory management (unless you’re my colleague Ben, who thinks that C++ is the pinnacle of software engineering and that none of us should have ever felt a need to use anything else). However, it’s still vitally important to understand the basics of how memory is allocated to manage data while executing a program, even if you’re not using a programming language where you have to do it manually–for example, have you ever received a “null reference exception” while running an application and wondered what it meant? In this article, we’ll explore the stack and the heap, two areas where memory is allocated for data in computer programs.
What is the stack?
The stack is a region of computer memory that operates in a “Last In, First Out” (LIFO) fashion. I once had a student who came up with the single best metaphor for the stack I’ve ever heard–a can of Pringles potato chips*. If you’re not familiar with Pringles, they are curved potato chips (the geometric term for their shape is a “hyperbolic paraboloid”) that come stacked in a narrow cardboard can with a plastic lid. The first Pringle to be put into the can ends up on the bottom of the stack, and the last Pringle to be put into the can is the one that ends up on top. In order to get to the first Pringle, you have to remove all the other Pringles, one at a time, starting with the last chip to be put into the can.
Just like the can of Pringles, the first item put onto the memory stack is the last one to be taken off, and the last item put onto the memory stack will be the first one taken off.
The stack is mostly used for managing function calls and local variables. When a method or function is called, a new stack frame is created that contains the local variables of the function, the address to return to when the function finishes, and other relevant data. After the function completes its execution and that data is no longer needed in memory, its stack frame is “popped” off the top of the stack.
Stack memory is very fast and efficient for managing data with a limited lifespan. It is important to note, though, that the stack has a fixed size, and exceeding its capacity can lead to an error condition called a “stack overflow”–you may be familiar with a website of the same name. Stack overflows happen when there is an attempt to place too much data onto the stack, and they are commonly caused by infinite loops or recursion, deeply nested function calls, and declaring excessively large local variables.
What is the heap?
The heap is a region of computer memory that is more flexible and dynamic than the stack. It’s a pool of memory that is used for allocating data whose size or lifetime can’t be determined at compile time. Data stored in the heap usually persists beyond the scope of individual functions, which makes the heap suitable for storing more complex data and data with a longer anticipated lifespan–it’s where you’ll find objects, arrays, and other data structures.
In some languages, notably C and C++, programmers are responsible for manually allocating and deallocating memory in the heap. While powerful, this also introduces the possibility for memory leaks and other problems if not done properly. Many general-purpose languages like Java, C#, and Python now have built-in “garbage collectors” that automatically manage the heap, removing data that is no longer needed to free up space for new objects.
Reference & Value Types
If local variables for a function are stored on the stack, and complex data is stored in the heap, what happens if you instantiate complex data inside of a function?
In some programming languages–C#, for example–data is divided into “value” types and “reference” types. Value types, like integers and booleans, contain a fixed amount of data and are stored directly on the stack. Reference-type data, however, is stored on the heap, and a reference to the data is placed in the stack. When the data is needed, the reference on the stack tells the program where to look in the heap in order to retrieve or manipulate the relevant data.
This means that when you instantiate an object, like a list, inside of a method, the list reference is stored on the stack, while the actual list data is allocated on the heap. In C#, you don’t have to manage any of this manually–the runtime does it for you, and the built-in garbage collector will clean up any objects in the heap that are no longer being used.
Other programming languages, like C, do not explicitly differentiate value-type and reference-type data, since programmers manage computer memory directly via pointers and data is defined by its location in memory. Scripting languages, like Python and JavaScript, also tend to not make clear distinctions between value-type and reference-type data. Whether a language differentiates between value-type and reference-type data often depends on how that language approaches memory management: lower-level languages, like C, may not need to have a clear distinction, while higher-level languages that utilize managed memory environments, like C# and Java, often do.
Conclusion
If you’re going to be a programmer, you should know about the stack and the heap (so, go ahead–check that off your to-do list!). Chances are that you’re learning with a higher-level language that manages the stack and heap for you, but they are still important concepts to know. As you continue writing code and creating more complex applications, understanding the stack and heap, and how memory is allocated between them, will help you troubleshoot when things go wrong in your programs. Happy coding!
*In the United States, Pringles can’t actually be marketed as potato chips, because they are actually made from a dough and are only 42% potato. They are instead sold as potato crisps, which caused a whole other problem in the UK.