« Back to the main CS 131 website

Lab 8: Protobuf and gRPC

Due May 5th at 8pm


Introduction

We’ve learned how applications can use socket system calls to communicate with other computers of the network. In particular, a computer can either act as a server or as a client for a given connection. The socket system calls help clients connect to servers, and they help servers accept connections. But you as the programmer still have to write all of the logic to handle a connection, decode client requests, etc. on the server, and logic to encode client requests into bytes on the client! Building distributed systems would be much easier if we could use a library for this “boilerplate” code, and focus our efforts on implementing the actual service we’re trying to build.

This lab will teach you the basics of Protobuf and gRPC, two libraries designed to make programming of networked services and distributed systems easier. Protobuf (short for “Protocol Buffers”) is a wire format, meaning that it specifies how to encode requests into bytes to send over the network, while gRPC is a code generator for remote procedure call (RPC) code that matches a high-level API specification. Protobuf and gRPC are often coupled together, but they’re actually separate frameworks that make client-server communication easier in different ways.

The lab will demonstrate why these libraries are useful, and prepare you for future assignments that use them.

Protobufs

Protobufs are a message format similar to JSON. Unlike JSON, which is a human-readable text format, Protobufs can be encoded into a space-efficient binary representation. This requires sending fewer bytes over the network. Protobuf messages can represent nested data structures containing primitive values, strings, enums (similar to C language unions), and other Protobuf messages. They are encoded by the sender, sent over the network, and decoded by the receiver.

What makes Protobufs convenient is that you only write a high-level description of the data you want to encode, and then the Protobuf compiler generates much of the encode/decode logic for you, in the language(s) of your choice. So for example, you could have a C++ server talk to clients written in Java, Go, or Python. As long as both the server and client are using the generated functions to encode/decode the Protobuf messages, they can understand each other!

gRPC

gRPC is a framework to create servers that allow clients to interact with them via a remote procedure call (RPC) interface. You define your server API in a file, and gRPC generates server and client-side code for you. The generated client code is complete, and you use it to make any API request to the server. However, the generated server code is incomplete! Since your server API is just an interface, gRPC leaves the implementation of your API functions up to you, but it does practically everything else for you.

Although gRPC generates server and client code that can read requests and send responses, it needs to base this around a message format. gRPC supports many message formats, but generates different code depending on your choice. If you want your server to use JSON for sending/receiving messages, this will generate different code than if you chose Protobufs or XML instead. The most common message format used with gRPC is Protobufs.


Assignment installation

First, ensure that your repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci1310/cs131-s20-labs.git

Then run:

$ git pull
$ git pull handout master

This will merge our Lab 8 stencil code with your previous work. If you have any “conflicts” from Lab 7, resolve them before continuing further. Run git push to save your work back to your personal repository.

Protobuf by Example

Protobufs (“protos”) are defined in .proto files, whose syntax follows the Protocol Buffer definition language. Let’s look at a simple example that defines a message containing info about a person:

message Person {
  string name = 1;
  string email = 2;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 3;
}

Here, a Person message contains up to three fields: a name, an email, and an array of phone numbers. Note that each field has type, a name, and a unique ID associated with it. Most primitive types like bool, int32, double, andstring are supported. Arrays are supported with the repeated keyword, and maps are supported, too. Messages can also be nested, like the array of PhoneNumbers is inside Person.

Still, Protobufs are designed to be simple. In fact, the entire Protobuf language description fits on this one page. Each message is a struct-like data container. Importantly, it is not a class that allows you to add custom methods or logic!

Once you’ve defined your messages, you can run the Protobuf compiler and have it generate data classes for your messages in whatever language(s) you wish. These classes includes simple getters and setters, as well as functions to encode and decode the message.

So, for example, if your chosen language is C++, running the compiler on the above .proto file will generate a Person class. You can then use this class in your application to populate, serialize, and retrieve Person protos. You might then write some code like this:

Person p;
p.set_name("John Doe");
p.set_email("jdoe@example.com");
std::string person_str = p.SerializeAsString();

This creates a person, sets their name and email, and gets the compressed string encoding of this Person. Then, if you sent this encoding to someone else, they could decode it:

std::string person_str = "xxxxxx";
Person person;
person.ParseFromString(&person_str);
printf("Name: %s\n", person.name().c_str());
printf("Email: %s\n", person.email().c_str());

Another important thing about Protobufs is that they are designed to be both forwards- and backwards-compatible. Each field is optional and contains a unique ID. To see how this achieves compatibility with prior or future versions, let’s add an address field to our Person message:

message Person {
  string name = 1;
  string email = 2;
  ...
  repeated PhoneNumber phone = 3;
  string address = 4;
}

Now suppose your server uses this updated version, but some of your clients do not. This is not an issue. In Protobuf messages are roughly encoded as

FieldID1 Data1
FieldID2 Data2
...
FieldIDn DataN

where the “FieldID” consists of the field number and the field type. What’s more, if a field is empty or not set, it is not encoded. So in reality the encoding might be like

FieldID3 Data3
FieldID7 Data7
FieldID11 Data11

If a client using the old version of our Person proto sends our server a message, then it will be missing the new address field that we added. But the address field – just like any other field – is optional, and the server will need to check whether it’s present.

On the other hand, if the server sends the client a Person message with an address, the old client code will simply ignore this new address field while parsing the message.

gRPC By Example

In gRPC, a client application can directly call a function on a server application on a different machine as if it were a local function, making it easier for you to create distributed applications and services. gRPC is based around the idea of defining a service. This requires you to specify the methods that are available for clients to call remotely, alongside their parameters and return types.

On the server side, the server implements this interface and runs a gRPC server to handle client calls. On the client side, the generated client code exposes the same methods as the server.

To better understand gRPC, let’s expand our Person example from earlier. Imagine we wanted to create an AddressBook service in the same .proto file. This service will allow you to add a contact, and search for a person in the AddressBook:

service AddressBook {
  rpc AddContact(Person) returns (Empty) {}
  rpc Search(Name) returns (People) {}
}

message Person {
  string name = 1;
  string email = 2;
  ...
}

message People { // an array of 0 or more Persons
  repeated Person people = 1;
}

message Name {
  string name = 1;
}

message Empty {} // empty message to represent a void response

Notice that our gRPC service definition is like an interface, with each method’s parameter and return types specified as Protobufs. Our AddContact RPC, for instance, takes a Person message as an input and returns an Empty message. RPC methods must have only one input argument and only one return type. Even if you don’t need an input/return type, you must provide one!

The gRPC Code You Should (And Shouldn’t) Write

Compiling our proto file generates client and server code. The specific code depends on the output language; here we use C++. Our client stub will look like the following:

class Stub final : public StubInterface {
public:
...
 ::grpc::Status AddContact(::grpc::ClientContext* context, const ::protos::Person& request, ::protos::Empty* response) override;
 ::grpc::Status Search(::grpc::ClientContext* context, const ::protos::Name& request, ::protos::People* response) override;
...
}

This client stub is a fully-functional client. It implements the StubInterface, and its methods match those in our service definition. For instance, the AddContact method takes in a Person request, and returns an Empty response. It also requires a ClientContext for additional request metadata, and returns a status code, which is like an HTTP response code. Successful requests always return with status OK; other options include INVALID_ARGUMENT, DEADLINE_EXCEEDED, PERMISSION_DENIED, etc.

Our server code, on the other hand, is incomplete. The following server code is generated:

class Service : public ::grpc::Service {
public:
 Service();
 virtual ~Service();
 virtual ::grpc::Status AddContact(::grpc::ServerContext* context, const ::protos::Person* request, ::protos::Empty* response);
 virtual ::grpc::Status Search(::grpc::ServerContext* context, const ::protos::Name* request, ::protos::People* response);
};

Here our AddContact and Search methods are virtual. Virtual methods are a C++ concept that allows specifying an interface, but requiring the programmer to implement a class that satisfies this interface. Specifically, the virtual methods come without an implementation, and you as the server developer have to provide that implementation.

The Service class does not define a functional server. For us to make one, we need to make a subclass that implements the virtual methods. Everything other than that is already done for us, or can be configured to meet our needs – from the networking code, to async, security, and even load balancing!

Once we’ve implemented our server logic, how can our server and client talk to one another? From the client-side, we can use our Stub class to connect to the server and send it an AddContact RPC:

// create a client and connect to the server
std::string srv_addr = "xxxx";
grpc::Channel c = grpc::CreateChannel(srv_addr,grpc::InsecureChannelCredentials());
std::unique_ptr<AddressBook::Stub> client(AddressBook::NewStub(c));

// create a Person, Empty, and ClientContext
grpc::ClientContext ctx;
Empty empty;
Person p;
p.set_name("Jim Bob");
p.set_email("jim@bob.com");

// send an AddContact RPC
client->AddContact(&ctx, p, &empty);

The resulting code is simple, but quite powerful. In the call to AddContact, a lot of code runs behind the scenes, on both the client and the server. Most of that code was generated for us by gRPC and Protobuf. On the client side, it almost seems as though we simply called AddContact on some local AddressBook class. However, our client’s AddContact function actually sets in motion a sequence of events:

  1. It encodes our Person message and sends it to the server over the network.
  2. Our AddressBook server fetches and decodes the message.
  3. The server executes the AddContact implementation on an underlying AddressBook instance that has the virtual methods implemented.
  4. The server encodes the response and sends it back to the client over the network.
  5. The client parses the response and returns from AddContact.

The only code that you need to write is the client code (shown above), and the logic that runs in step 3.

You’re now ready to start working with Protobufs and gRPC! :smiley:

Task

Please use the course VM to complete this lab.

In this lab, you’ll call functions that Protobuf has generated for you. Although the naming conventions for these functions will differ across programming languages, in C++ they are as follows.

Given a message

message Foo {
  string x = 1;
  bool y = 2;
  repeated string z = 3;
}

the Protobuf compiler will generate a Foo class with the following methods:

For the repeated field z, the generated functions are more complex and documented here.

Task: For this lab, you’ll create a simple todo list application.

You’ll write the entire .proto file, and some of the client and server code.

  1. Run setup.sh to install Protobuf and gRPC. This may take a while (~20 minutes).
  2. Complete thetodo.proto file.
  3. Complete the remaining functions in todo_client.cc and todo_server.cc.

Compile and Run

This lab uses the CMake build tool, which many larger C++ projects use. CMake auto-generates Makefiles for you, which is handy when your project consists of many files.

To make the lab for the first time run:

# to compile the lab
$ cmake .
$ make clean all

Afterwards, you can just run make clean all.

Note: The lab will not build until you’ve correctly completed the .proto file.

To run the lab, first open up a server with:

$ ./todo_server 

To get a client to connect to it, in a separate terminal run:

$ ./todo_client

Handin instructions

Turn in your code by pushing your git repository to github.com/csci1310/cs131-s20-labs-YOURNAME.git.

Then, head to the grading server. On the “Labs” page, use the “Lab 8 checkoff” button to check off your lab.