Bonus lecture: Introduction to C++

Bonus lecture: Introduction to C++ #

❓ No Post-Lecture Quiz for this lecture

⚠️ Heads-up: these notes are based on an old version of this lecture, from several years which didn’t cover all the content we discussed in our version. Nick will update these notes to reflect the lecture content over the Thanksgiving break – this notice will go away when they have been updated!

Why are we using C++?

You will by now have started to appreciate the power of the C programming language: it gives you direct access to memory, it matches very closely with the concepts in the underlying hardware, and it is able to achieve very high performance because the language adds no to very little overhead to your program.

However, writing programs in C can feel a bit like trying to build your own car from scratch: very educational, but you really have to do everything yourself. C doesn’t come with any data structures in a standard library, and as you will recall from the vectors part of Project 1, writing a generic datastructure is somewhat painful. C++ tries to make things easier: it still gives you access to all the low-level power of C, but it also allows you to write code at a higher level of abstraction, using classes, objects, and various other advanced features.

So far, we’ve used the C programming language in the course. We will now increasingly start working on C++, which is a seperate programming language from C. However, C++ and C are closely related: indeed, pretty much any valid C program is also a valid C++ program.

Note that C++ is a huge programming language, especially compared to C, and comes with many advanced (and some ill-advised) features. The C++ we will write in this course mostly focuses on the C-like subset of the language, with the addition of classes, objects, and some standard library data structures. If you’d like to learn more about the advanced features of C++, check out the links on our C/C++ Primer page.

Compiling C++ programs #

Despite its similarity to C, C++ is a separate programming language with its own compilers and tools.

There are many C++ compilers, but the most widely-used ones are GCC’s g++ (the C++ equivalent of gcc) and LLVM’s clang++ (the C++ equivalent of clang). You can use either for the course; clang++ sometimes has easier-to-understand error messages than g++.

The good news, though, is that the command line options for these compilers are practically identical to those for their C equivalents, and that all your favorite build and debugging tools (Makefiles, GDB, sanitizers, etc.) still work for C++.

Classes and Objects #

C++ is an object-oriented programming language, meaning that it includes the notion of classes that you can instantiate into objects. If you know Java, these concepts will seem very familiar to you. A class defines data and functionality associated with a specific type, and each class can have many individual instances in the form of objects of that class. Classes and objects help programmers organize their code, and allow for some data to be accessible only via specific functions – an idea known as “encapsulation”. Object-orientation also allows you to write programs that are crazy difficult to understand, and some believe it’s overkill – we won’t pass judgement on this, but rather make use of C++’s object-oriented features to make our life as systems programmers easier.

Let’s look at the specific example of a program that seeks to represent pets via the Animal class type. If you were to write a C program for this purpose, you might define a struct type that tracks information about a specific animal, such as its name, age, and weight:

typedef struct animal {
  char* name;
  int age;
  int weight;
} animal_t;

Specific instances of this structure can exist on the stack or on the heap:

int main() {
  animal_t stack_cat;
  stack_cat.name = "kitty";
  stack_cat.age = 5;
  stack_cat.weight = 10;

  animal_t* heap_dog = (animal_t*) malloc(sizeof(animal_t));
  heap_dog->name = "doggy";
  heap_dog->age = 8;

  // [...]
}

This works fine, but comes with several downsides:

Any piece of code can set the values of any of the struct’s members without validation; for example, nothing prevents non-sensical assignments like heap_dog->weight = 999;.
Any function that operates on an animal needs to explictly take a pointer to the specific animal in question as an argument (recall how all the vector methods in Project 1 took a vector_t* as their first argument), so that the function body knows what memory to access.

C++ extends the C struct notion with functionality to allow instances of a type to have behavior (i.e., associated methods). To do so, you can define functions as part of the struct definition:

typedef struct Animal {
  char* name;
  int age;
  int weight;

  // new in C++: define methods on instances of this struct
  void setWeight(int w) {
    if (w > 50) {
      printf("error: unrealistic weight!\n");
      return;
    }
    this->weight = w;
  }

  int getWeight() {
    return this->weight;
  }
} animal_t;

You’ll notice the this keyword inside setWeight and getWeight methods here. this is always a pointer to the instance of the struct that the method was called on – in other words, its type is Animal* in this example. This implicit access to a pointer to the current instance allows calling methods on an animal using the same syntax as C struct member access:

int main() {
  Animal stack_cat;
  stack_cat.name = "kitty";
  stack_cat.age = 5;
  stack_cat.setWeight(10);

  animal_t stack_dog; // can still use type alias, just like in C!
  stack_dog.name = "doggy";
  stack_dog.age = 5;
  stack_dog.setWeight(999); // will report an error
}

To be backwards-compatible with C, a C++ struct without any methods has exactly the same syntax and behaves exactly the same way as the C struct would. But you may note that even though our Animal struct defines handy methods to get and set the weight of the animal, including some validation in setWeight, there is nothing preventing code from directly modifying the weight member of the struct.

C++ provides the class keyword to help you define structs whose members are protected from arbitrary access. The definition of class looks exactly the same as that of a struct, with the exception that you can define some members to be public and some to be private, as follows:

typedef class Animal {
 public:
  char* name;
  int age;
 private:
  int weight;

 public:
  // to allow access to the private `weight` member via methods, these need to be public
  void setWeight(int w) {
    if (w > 50) {
      printf("error: unrealistic weight!\n");
      return;
    }
    this->weight = w;
  }

  int getWeight() {
    return this->weight;
  }
} animal_t;

These access modifiers split the definition into sections, and the compiler will prevent any access to private members from outside the methods associated with the class. (Both member variables, called fields, and member functions, called methods can be private.)

Are there ways around access modifiers?

It’s important to realize that access modifiers are merely a helpful aid to the programmer, not a failsafe protection mechanism. Only the compiler looks at access modifiers and checks them; once the compiler has turned the C++ code into assembly or machine code, no notion of access modifier protection remains. In particular, the access modifiers are never checked at program runtime!

But even the compiler can be fooled. Since C++ is a systems programming language, it allows for direct memory access, including pointer arithmetic. This actually provides a way for programs to circumvent the private access modifier: knowing at what byte offset in a class or struct a field is located is sufficient to form a pointer to that field, and to ultimately access the memory. No C++ compiler can prove the absence of such illegal accesses without additional hints; this is an instance of the pointer aliasing problem.

Finally, what if you want to create an instance of a class? In the above examples, we’ve already seen stack-allocated objects of the Animal class. To make heap-allocated objects, C++ uses the new keyword:

int main() {
  Animal* heap_cat = new Animal;
  heap_cat->name = "kitty";
  heap_cat->age = 5;
  heap_cat->setWeight(10);
}

new here works exactly like (Animal*) malloc(sizeof(Animal)), allocating sufficient heap memory for an Animal structure. On top of allocating memory, however, new also calls a special method on the class called the constructor. Constructors are helpful in order to initialize the fields of the object: recall that uninitialized memory may contain arbitrary garbage! (cpp1.cc in the lecture code shows an example of how a stack-allocated object can have surprising contents if the fields aren’t set correctly.) To define a constructor, you add a method without a return type (think about this: what would the constructor return?) and with the same name as the class/struct name:

typedef class Animal {
 public:
  char* name;
  int age;
 private:
  int weight;

 public:
  // constructor, takes two arguments
  Animal(char* name, int age) {
    this->name = name;
    this->age = age;
    this->weight = 0;  // can access private field from constructor
  }

  // ... other methods
} animal_t;

int main() {
  // calls constructor on creating stack-allocated object
  Animal stack_cat("kitty", 5);
  stack_cat.setWeight(10);

  // calls constructor on creating heap-allocated object
  Animal* heap_dog = new Animal("doggy", 5);

  // [...]
}

Constructors help set up objects; by default, C++ adds an empty zero-argument constructor to each struct or class, which is why the above examples without an explicit constructor call are still valid.

Initializer list syntax for constructors

Rather than writing several lines of the form this->field = ... in the constructor to initialize fields, C++ permits a shorthand syntax called initializer list. Separated by a colon from the constructor declaration, the initializer list consists of a comma-separated list of field(argument) pairs. For example:
//                            | initializer list
//                            v
Animal(char* name, int age) : name(name), age(age), weight(0) {
 // obsoleted by initializer list
 // this->name = name;
 // this->age = age;
 // this->weight = 0;
}

Finally, how do you get rid of a heap-allocated object? For this purpose, and as a counterpoint to new, C++ provides the delete keyword. delete is to new what free() is to malloc().

You can see some more examples of C++ classes and structures in cpp1.cc.

Standard Library Data Structures #

One big advantage of C++ over C is that C++ comes with a large standard library with many common data structures implemented. We will use some of these data structures in the rest of the course.

The data structure part of the C++ library is called the Standard Template Library (STL), and it contains various “container” structures represent different kinds of collections. For example:

std::vector is a vector (dynamically-sized array) similar to the vector you implemented in Project 1.
std::map provides an ordered key-value map, with an API somewhat similar to a Python dictionary, though with much stricter rules (fixed key and value types, no nesting, and others). The ordered map is typically implemented as a heap (the data structure, not the memory segment) or tree, so many operations are O(log N) complexity for a map of size N>.
std::unordered_map provides an unordered key-value map, implemented as a hashtable with most operations having O(1) amortized complexity. Again, the API is somewhat similar to a Python dictionary, but with all the constraints of a map and the added constraint that the key type must be hashable (true of the primitive C++ types, but requires additional implementation for more complex types).
std::set and std::unordered_set provide ordered and unordered set abstractions, with APIs that support addition, removal, membership checking and other set operations.

The difference between the ordered and unordered variants of these data structures matters when iterating over them: an ordered collection always guarantees the same, specific iteration order, while and unordered collection makes no such guarantee.

STL collections are generic, meaning that they can hold elements of any type. This is extremely handy, because it means that we don’t need separate implementations for, say, a vector of integers and a vector of strings. Recall that in your C vector implementation for Project 1, you had to use void* pointers and explicit element size arguments to make the vector generic; fortunately, generic C++ data structures require no such things. To tell the data structure what specific types it should assume, we include the types in angle brackets when we refer to the data structure type: for example, a std::vector<int> is a vector of ints, while a std::vector<Animal> would be a vector of Animal objects, and std::vector<int*> is a vector of pointers to integers.

How do generic data structures work?

The details of how generic C++ STL data structures work are complex and related to an advanced feature of the C++ language called “templating”. You won’t need to understand how to write templated classes for this course, but you can think of this as writing a class with one or more type parameter that the compiler searches and replaces with the actual types before it compiles your code. For example, a std::vector<T> specifies a type parameter T for the type of the vector elements, and all code implementing the vector will use T to refer to the element type. Only when you actually use, e.g., a vector<int> will the compiler generate and compile code for a vector of integers and appropriately set all element sizes in the code.

You can declare both stack-allocated and heap-allocated STL container data structures, and cpp2.cc shows some examples. However, one very important thing to realize is that these C++ data structures may themselves allocate memory on the heap (in fact, they usually do!), even if the data structure itself is declared as stack-allocated. If you think about this, this makes sense: all of these data structures are dynamic in size, i.e., you can add and remove elements in your code as you wish. This means that the data structures cannot be entirely on the stack or in the static segment, since both of these segments require object storage sizes to be known at compile time.

We won’t be able to cover in detail all the APIs that STL collections offer in lectures, and we encourage you to make use of the reference links on our C++ primer page to explore them. The reference material can seem verbose and confusing at first; often, it’s easiest to look at the code examples included in the documentation for specific methods to develop an intuition for how you use them. The methods you want often have relatively obvious names (e.g., contains(T element) checks if an std::vector<T> contains element; push_back(T element) on the same vector adds an element to the back), but not always (e.g., the easiest way to insert into a std::map<K, V> is to use emplace(K key, V value)).

Summary #

Finally, we talked about the C++ programming language and how it adds object-oriented features to C. We saw that C++ classes are basically fancy structs with methods (member functions), and how C++ allows you to create instances (objects) of such classes.

Today, we looked into the handy data structures provided by the C++ standard library, and got an intial feel for how you can use them to make your life easier.