« Back to the main CS 131 website

Project 1: Strings + Vectors

Collaborative Hours: Jan 26 – Feb 1
TA Hours: Feb 2 – Feb 14
Due Feb 14, 2020 at 6:00pm


Introduction

In the first portion of this project, you will implement a subset of C’s string manipulation library. In the second portion of this project, you will create your own implementation of a vector datastructure (a structure similar to std::vector in the C++ Standard Library or to an ArrayList in Java). Through both portions of the project, you will gain a better understanding of pointers as well as the general layout of data in memory.

You’re unlikely to ever find yourself implementing your own string or vector library outside this course. However, this assignment is designed to teach you about data representation concepts in memory that software engineers in industry use every day. Plus, it doesn’t hurt to understand how the standard libraries does their magic!

Learning Objectives

Conceptual Questions

General Questions:

  1. What is the expected output of the following program and why?
#include <stdio.h> void func(const char* s) { s++; } int main() { char* hello = "hello"; func(hello); printf("%s", hello); }
  1. The diagram below shows 2 arrays and respective pointers to each one. ptr1 is of type char * and ptr2 is of type int *. If ptr1 points at address 0x1, what would be the address pointed to by ptr1 + 1? If ptr2 points at address 0x1, what address would ptr2 + 1 point to?
  2. What is the difference between stack allocation and heap allocation? Describe how you can do each, and when you would chose one or the other.

Project-specific questions:

  1. What is the difference betweeen a vector and an array? When would you choose to use one over the other?
  2. Consider the following program. Which segments of memory are main, vector_type, int_vector, and fun_adjective located in? Before destroy_vector on line 13 is called, how are fun_adjective and int_vector laid out in memory? Be sure to mention the sizes and locations of each field in the vector struct, and of any data it refers to.
    • Check out the optional section of Lab 1 for some hints!
#include <string.h> #include "vector131.h" size_t vector_type = 4; int main() { int val = 5; char fun_adjective[8]; vector_t int_vector; strncpy(fun_adjective, "awesome", 8); initialize_vector(&int_vector, vector_type); vector_add_back(&int_vector, &val); destroy_vector(&int_vector); }

Installation

If you have completed Lab 0, you will already have created a private GitHub repository for your project work under the csci1310 organization.

If you have not completed Lab 0, we suggest that you do it now!

Recall, to clone your repository for the projects, you run this command (inside the course VM, or on the computer you’re working on):

$ git clone git@github.com:csci1310-s20-projects-YOURUSERNAME.git

(You’ll hopefully have done this as part of Lab 0 already.)

Now, ensure that your repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci1310/cs131-s20-projects.git

Then run:

$ git pull
$ git pull handout master

This will merge our Project 1 stencil code with your repository.

Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the strvec directory in the working copy of your projects repository.

Infrastructure help

Part 1: Strings

String manipulation is an important concept in computer science, and it is something that comes up very often in systems programming. Most programming languages have a string library that relieves programmers from writing their own string operations for every program. The C Standard Library has some excellent string manipulation facilities, but we want to assemble one ourselves!

For this part of the project, you will implement a subset of C’s standard library string functions.

Stencil

Relevant Files Description
strings.c Contains the stencil for the string library you will be implementing.
strings_tests.c Contains a few basic tests for each string function. You may want to add more to ensure your code is working correctly.
test_runner.c This is the entry point for the program. It contains a few helper functions to write unit tests. It will run the corresponding unit tests according to the arguments you run the program with.
Makefile This will compile the .c files into two different test executables. Check out theCompile and Run section below for more information.

Your String Library

Task: Implement the following string functions in strings.c.

Note: Here are the details about the functions you are implementing (click to expand on each function). We’ve also included a link to the relevant man page for each function — Linux man pages provide great documentation on C library functions and are a helpful resource for understanding the behavior of edge cases.

strlen: Computes the length of a string, excluding the terminating null byte.

Check out the manual page: Man Page

Synopsis:

size_t strlen(const char *s);

Examples:

size_t len = strlen("ALGOT / SKADIS");  // len = 14
strspn: Computes the length of the largest prefix of a string that consists entirely of characters from its second argument.

Check out the manual page: Man Page

Synopsis:
Computes the number of bytes in the largest prefix of s that contains only characters from accept.

size_t strspn(const char* s, const char* accept);

Examples:

char* s = "Design your own ELVARLI storage systems"; char* accept = "Design your ELVARLI"; size_t span = strspn(s, accept); // span = 13 span = strspn("abcde", "ac"); // span = 1 span = strspn("123456", "ab"); // span = 0 span = strspn("hello", "hel"); // span = 4
strncmp: Compares the first (at most) n bytes of two strings.

Check out the manual page: Man Page

Synopsis
Returns a number less than, equal to, or greater than 0, if the bytes in s1 are found to be lexicographically less than, equal to, or greater than those in s2.

int strncmp(const char *s1, const char *s2, size t n);

Examples:

int result = strncmp("ABCXYZ", "ABCXYZAAO", 6); // result = 0
result = strncmp("abcde", "abdde", 3); // result = some negative number
result = strncmp("abcde", "abdde", 2); // result = 0
result = strncmp("1234567", "123", 100); // some positive number
strncpy: copies the first n bytes from one string to another.

Check out the manual page: Man Page

Synopsis:
Copies at most n bytes of the string pointed to by src to the buffer pointed to by dest. If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.

char *strncpy(char* dest, const char* src, size_t n);

Examples:

char dest[] = "MACKAPAR";
strncpy(dest, "TJUSIG", 3); // dest now becomes "TJUKAPAR"

Note
Think about why we might have used a char[] instead of a char* for dest.

strstr: Finds the first occurrence of one string in another.

Check out the manual page: Man Page

Synopsis:
Finds the first occurrence of the substring needle in the string haystack and returns a pointer to that occurrence.

char* strstr(const char* haystack, const char* needle);

Examples:

char* needle = "NYMANE";
char* haystack = "---HOLJES---NYMANE---NYMANE---";
char* location = strstr(haystack, needle);
// location = "NYMANE---NYMANE---"

Compile, Run, and Test

The Makefile

Test your work (run_tests)

Test against the C Standard Library (run_builtin_tests)

Write your own tests in strings_tests.c

Need help running your code?
# Make the executables
make  

# Runs tests from `test_strlen` and `test_strstr`in `strings_tests.c`
# on your strings library
./run_tests strlen strstr

# Runs all unit tests in `strings_tests.c` on your strings library
./run_tests all

# Runs all unit tests in `strings_tests.c` on the builtin library
./run_builtin_tests all 

Tips for Getting Started

Tips for Debugging

$ make
$ gdb run_tests
(gdb) b test_strlen
(gdb) b strlen
# Run the executable with strlen as the argument
(gdb) r strlen

Part 2: Vectors

One common data structure which you may be familiar with is an array. Internally, arrays are just adjacent elements of the same type in memory. This allows for fast traversal of the data since iterating though the array elements is as simple as the CPU incrementing a pointer. However, standard arrays are a fixed length and cannot be grown after they have been declared. One alternative to an array is a linked list, which has no fixed size. However, linked lists are what are known as “pointer chasing” data structures. Each element in a linked list includes a pointer to the next, but none of the elements are necessarily near each other in memory.

In this part of the project you will implement a vector, sometimes refered to as an “arraylist”, which is a compromise between the two data structures. Vectors, like arrays, hold their contents in a contiguous region of memory. However, like linked lists, they can dynamically grow in size by reallocating space for their data. To avoid reallocating data everytime a new element is added, they’ll often allocate more space than the user requested.

Assignment

In this part of the project, you will create your own implementation of a vector in C. Following this project, you will use vectors, as well as many other data structures defined in the C++ standard template library, for future assignments.

You have been provided with the basic structure for a vector in vector.h. The defintion of this struct is below:

typedef struct vector131 {
    void *array; //pointer to a region of memory containing vector's data
    size_t ele_size; //size of the elements type
    size_t capacity; //number of elements the vector can fit
    size_t length; //number of elements in the vector
} vector_t;

Notes and Resources:

Capacity Doubling Example

Task:
Complete the vector implementation by filling in the following functions within vector.c. Click on each function for more details of their implementation. This portion of the project, like the last, should be implemented in C.

All of the functions you are responsible for implementing is contained within vector.c.

initialize_vector: Initializes all the fields of a vector
void initialize_vector(vector_t* v, size_t type)
  • This function is responsible for allocating space for and initializing all of the fields of the vector pointed to by the first argument, v.
  • The second argument, type, indicates the size of the elements in v.
  • For the purposes of this project, you should initialize your vector to have a starting capacity of 1.
destroy_vector: Frees a vector
void destroy_vector(vector_t* v)

This function is responsible for freeing any space within the vector v that was allocated. It is expected that every call to malloc has a corresponding call to free. As such, you should use the destroy_vector function to free any data within your vector that was allocated using a malloc function.

vector_size: Returns vector length
size_t vector_size(vector_t* v);
  • This function simply returns the length of the vector v (in units of elements).
  • The behavior for this function should be equivalent to the size function of C++ vectors.
vector_get: Returns an element in the vector at a particular index
void* vector_get(vector_t* v, int index)
  • This function gets the element at a specified index in the vector v. A void pointer is returned to allow the vectors to be of generic type (it is expected that the caller casts the result to their desired type). The vectors are indexed into using zero indexing (meaning the first element is at index 0 and the last element is at index length-1)
  • The behaviorvector_get should be equivalent to the at function of C++ vectors; failure cases should return a null pointer.
vector_add_back: Adds an element to the end of a vector
void vector_add_back(vector_t* v, void* ele)
  • This function adds a copy of the desired element to the end of the vector v. If your vector has reached capacity, this function is also responsible for growing the vector’s array. Each time a vector reaches capacity, its capacity should be doubled.
  • The behavior of vector_add_back should be equivalent to the c++ vector push_back function
  • Note: your vector_add_back implementation will have to use memory allocation functions (such as realloc), which should be traditionally error checked. However, since your vector is meant to be modeled after the C++ vector implementation, which has no return value for push_back and will instead silently fail, it is acceptable for your vector implementation to omit error checking on allocation functions.
vector_delete_back: Removes the last element from the vector
void vector_delete_back(vector_t* v)
  • This function removes the last element (the element at index length-1), from the vector v.
    • HINT: This function is actually quite simple. You don’t need to do anything from a memory perspective.
  • The behavior of this function should be equivalent to the pop_back function of C++ vectors.
vector_add: Adds an element to the vector at a specific index
void vector_add(vector_t *v, void *ele, int index)
  • This function adds a copy of the desired element to the specified index within the vector. As vectors are just dynamically sized arrays and elements in arrays are contiguous, the valid indices for vector_add are 0 through the vector’s current length. If the index specified is not the vector’s current length (meaning you aren’t just tacking on another element to the end of the array), the new element should not overwrite the element currently at that index. Instead, the contents of the array starting at the desired index to the array’s end should be moved so that the new element can be inserted. Once again, this function is repsonsible for increasing the capacity of the vector once the current capacity is reached.
  • The functionality of vector_add should be equivalent to the insert function of C++ vectors. Just note that insert takes in an itertaor as an argument to indicate the desired position within the vector while your implementation will simply use an integer to represent the desired index.
  • Once again, you do not need to error check allocation functions for vector_add. You may allow your implementation to silently fail like the c++ vector.
vector_delete: Removes an element from the vector at a specific index
void vector_delete(vector_t *v, int index)
  • This function removes the element at the specified index. In order to keep the vector’s internal array contiguous, the contents of the array starting at the next index through the end of the array should be shifted so that there are no gaps. Valid indices for the vector_delete function are 0 through the vector’s current length - 1.
  • The functionality of vector_delete should match the erase function of c++ vectors. Just note that, like insert, erase uses an iterator rather than in integer to indicate the desired index.

A Note on Pointer Arithmetic:

Recall from lecture that when you perform operations on pointers, the pointer is incremented/decremented in units of sizeof(type) bytes.

In this project, the pointers you are working with are void (meaning they have no specified type) so that your vector implementation can be generic. However, performing pointer arithmetic on void pointers will result in compiler errors since the compiler will not know by how many bytes the pointer should actually be indexed. One way to solve this is to first cast your pointer to a char* before doing any operations on it. Once you have casted your pointer to a char*, you can then access the elements at various offsets from the pointer by adding the exact number of bytes between the pointer and the desired element.

Relevant Library Functions

malloc: Allocates a block of memory

Check out the manual page: Man Page

void *malloc(size_t size)

This function allocates a region of memory of the desired size in the heap and returns a pointer to it.

realloc: Changes the size of a memory block

Check out the manual page: Man Page

void *realloc(void *ptr, size_t size)
  • This function changes the size of the region of memory pointed to by ptr to size.
  • Realloc on a null pointer is equivalent to calling malloc
  • The ptr must be a pointer that was previously returned by another system call in the malloc family (unless it is null)
free: Frees a block of memory

Check out the manual page: Man Page

void free(void *ptr)
  • This function frees the memory pointed to by ptr so that it can be later reused by the system
  • The ptr must be a pointer that was previously returned by a function in malloc family
memcpy: Moves bytes from one region of memory to another non-overlapping region

Check out the manual page: Man Page

void *memcpy(void *dest, void *src, size_t n)
  • This function moves n bytes from the region of memory pointed to by src to the region pointed to by dest
  • For memcpy, the regions of memory indicated by src and dest cannot overlap
memmove: Moves bytes from one region of memory to another

Check out the manual page: Man Page

void *memmove(void *dest, void *src, size_t n)
  • Like memcpy, this function moves n bytes from src to dest. However, for memmove, src and dest are allowed to be overlapping regions of memory.

Compile, Run, Test

The Makefile

Testing

This test suite compares your vector implementation to the C++ standard library vector (std::vector). When your implementation is fully functional, the testsuite executable should print nothing out besides your point total.

Upon test failure, the testsuite prints out the expected state of the vector versus your vector. This output, in addition to the use of gdb, should help you track down and fix your errors. Additionally, the Makefile compiles the vector code with an address sanitizer. As such, you will get warnings if your implementation has any memory leaks or errors. A fully functional implementation should show no error output from the address sanitizer while it’s running.

Handing In Your Code

You will hand in your code using Git. In the strvec/ subdirectory, you MUST fill in the text file called README.md. The README.md file will include the following:

  1. Any design decisions you made and comments for graders, under “Design Overview”. If there’s nothing interesting to say, just list “None”.
  2. Any collaborators and citations for help that you received, under “Collaborators”. CS 131 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
  3. Your answers to the conceptual questions at the start of this handout under “Conceptual Questions”.
  4. Notes on approximately how long it took you to complete the project. We won’t use this for grading, but we collect the data to calibrate for future years.

Grading breakdown:

Your submission will not be graded until you configure grading server. You should have used the grading server for Lab 0 already.

If you were registered for CS 131 on the first day of classes, you should have received credentials by email.

If you only recently registered or you’re shopping the course but haven’t registered, you won’t have an account yet. In that case, please complete this form and wait until you receive your credentials. We will create accounts based on the form entries at 8pm every day.

Step 1: Log into the grading server with the password you received. Once you’re logged in, enter your GitHub username (at the top).

Step 2: Add the repository URL under the “Strings and Vectors” category. Click “Strings and Vectors” to initiate a fetch of your repository from GitHub.


Note: If your GitHub account is not linked to your Brown email address, the grading server will give you a command to run to verify your project repository.

Step 3: You should see your commit listed on the Labs page, as shown above. You can use the buttons below the commit list to test your code in our grading environment. Make sure it still passes all the tests!


Congratulations, you’ve completed the first project!


Acknowledgements: We are grateful to Brown’s CS 33 course for providing the basis for the strings portiton of this project. The vectors portion was developed for CS 131.