Classroom practice: Dynamic arrays, part I-II.

Gábor Horváth / Zsolt Kohári · 2020.11.06.

Dynamic memory allocation. Dynamic arrays.

Preparation for the practice:

The lecture notes on arrays should be reviewed.
The lecture notes on pointers and strings should be reviewed.
The lecture notes on dynamic memory management should be reviewed.

1. Where is the mistake?

The code snippets contain at least one (or potentially more) errors. Find them! Explain what the problem is! What makes the codes wrong? What is the principal problem with them? How to fix them?

int *fun(void) {
    int i = 2;
    return &i;
}

Solution

The local variable i is going to be destroyed right after the function returns. We have returned its address, but the calling function can not use it, since it points to a non-existing value. Such pointers are called dangling pointer. If we are interested in the value of the variable, we need to return that: the return type of the function should be int, and return i.

int *fun(void) {
    int array[100];
    array[0] = 2;
    return array;
}

Solution

The problem is the same as before. The function returns the starting address of the array, but the array gets removed from the memory right after the function returns, its memory allocation gets released. Hence, the calling function receives a useless pointer. There are two ways to fix this issue. First, the array can be allocated in the caller, as:

void fun(int *array) {
    array[0] = 2;
}

...
int array[100];
fun(array);
...

Or, it can be solved with dynamic memory allocation:

int *fun(void) {
    int *array;
    array = (int*) malloc(sizeof(int) * 100);
    array[0] = 2;
    return array;
}

/* concatenates two strings to a dynamically allocated array */
char *concatenate(char const *a, char const *b);

printf("%s", concatenate("apple", "tree"));

Solution

Memory allocated dynamically has to be released after use. The concatenate function returns a pointer to the allocated memory, but this pointer has to be used for two different purposes: to print the string in printf(), then to call the free() to release the memory. Hence, we need to introduce a variable to store the pointer:

char *s = concatenate("apple", "tree");
printf("%s", s);
free(s);

Forgetting about releasing the memory leads to memory leak.

2. Values backwards – again

The specification is very similar to a lab task of week 8, except the number of values is not known in advance: we have to ask the user to enter real numbers, and print it at the end in a reverse order. The input sequence of values is terminated (when the user enters –1, stop and print the result). To ensure that there are no restrictions on the number of real values entered by the user, increase the size of the dynamic array by one each time a new number is entered!

Hint

The dynamic array can not be allocated at once if the number of elements is not known in advance. That is why we need to resize the dynamic array after every reading of a new entry. The steps are:

Allocating a new array.
Copy existing values from the old.
Release the old array.
Make your main pointer point to the new.
Put the new value in.

Remember, a pointer can be set to point to any allocated memory segment! The main rule is to remember the address of each dynamically allocated memory segment. If we lose an address, there is no way to access the stored data, neither can the allocated memory be released.

More hint

At the very beginning, before any real value read, there is no need for an array; the pointer can be NULL. It could be malloc(0*sizeof(double)) as well (but it must be released later). malloc(0) and free(NULL) are valid calls. This way the separate treatment of the "no array" (empty) case is not necessary in several algorithms.

Solution

The steps mentioned above are marked in the code as comments.

Initialy, when there are no numbers, the array does not exist yet, in this case the pointer can be NULL. In an alternative solution we could have written malloc(0*sizeof(double)), too, since malloc(0) is a valid memory allocation, just like free(NULL) is valid, too, to make such algorithms easier to implement.

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    /* reading the numbers */
    printf("Enter the numbers, terminated by -1!\n");
    int cnt = 0;
    double *numbers = NULL;
    double newnum;
    while (scanf("%lf", &newnum) == 1 && newnum != -1) {
        /* extend the array by one element */
        double *newarr = (double*) malloc(sizeof(double) * (cnt+1)); // 1
        for (int i = 0; i < cnt; ++i)
            newarr[i] = numbers[i]; // 2
        free(numbers); // 3
        numbers = newarr; // 4
        numbers[cnt] = newnum; // 5
        ++cnt;
    }

    for (int i = cnt-1; i >= 0; --i) {
        printf("%f\n", numbers[i]);
    }

    free(numbers);

    return 0;
}

3. Data type for storing sets

Write a program that implements various operations on a set consisting of real numbers. The number of elements in the set can be arbitrary large. The following operations must be implemented (by a function for each):

It must be possible to check whether an element is in the set or not
It must be possible to insert a new element into the set (if it is already there, nothing should happen)
It must be possible to remove an element from the set

Draw a figure about the memory layout of the set data structure!

Theoretically, it is not recommended to compare real numbers using the == operator. For simplicity, let us now ignore this issue and focus on the dynamic memory operations instead!

Solution

The requirement "the number of elements in the set can be arbitrary large" indicates that we have to rely on the dynamic memory management. The elements of the set can be stored in a dynamically growing array. A set is characterized by two properties: the pointer pointing to this array and the number of elements in the array. These two data must be managed together all the time, hence it makes sense to enclose them in a structure:

typedef struct Set {
    int len;
    double *data;
} Set;

When an instance of the set is created, both the len and the data member variables contain memory garbage. Consequently, the set must be properly initialized before the first usage. To avoid memory leaks, the memory allocated for the array must be released when the set is not used any more. Therefore, apart from the above listed three functions we need to implement two more: one for initializing and one for destroying a set.

Many functions operating of the set modify the member variables of the structure. For instance, the data array may get reallocated, the length of the array can grow, etc. For this reason, we will pass the pointer to the set structure to these functions, instead of passing the structure itself. This is only correct solution in case of the insert and the remove functions, but for convenience (to make the call interface of the functions uniform) we are going to implement all functions like this. In this case we do not have to remember the exact parameter list of each function, since they are the same. Those functions, that dont modify the set (e.g., the one that checks if a given element is included) will receive a pointer pointing to a constant structure, so that the compiler can check whether we wrote a wrong code in the function that modifies the set by mistake.

Inserting a new number to the set or removing one from it is a costly operation. Since arrays can not be resized, the a new array must be created every time the arrray is modified, and the old elements must be copied from the old place to the new one. This copy operation makes the implementation inefficient. After the new array is fully filled with the updated content, the old one must be released and the pointer in the structure pointing to the data array must be updated to point to the new array. (The realloc() function does basically the same)

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

typedef struct Set {
    int len;
    double *data;
} Set;

/* receives an uninitialized set and makes a valid empty set from it */
void set_init(Set *s) {
    s->len = 0;
    s->data = NULL;
}

/* releases the set when it is not needed anymore */
void set_destroy(Set *s) {
    free(s->data);
}

/* returns true is the element is in the set */
bool set_contains(Set const *s, double what) {
    for (int i = 0; i < s->len; ++i)
        if (s->data[i] == what)
            return true;  /* found in the set */
    return false;         /* not found */
}

void set_insert(Set *s, double what) {
    /* do nothing if already in the set */
    if (set_contains(s, what))
        return;

    /* copy elements to a bigger by one array  */
    double *newarr = (double*) malloc((s->len + 1) * sizeof(double));
    for (int i = 0; i < s->len; ++i)
        newarr[i] = s->data[i];

    newarr[s->len] = what;  /* put the new element to the end */
    free(s->data);    /* the old array is not needed any more */
    s->data = newarr;     /* the pointer points to the new array from now on */
    s->len++;          /* number of elements has increased by one */
}

void set_remove(Set *s, double what) {
    /* do nothing if not in the set */
    if (!set_contains(s, what))
        return;

    /* new array, number of elements is less by one */
    double *newarr = (double*) malloc((s->len - 1) * sizeof(double));
    int j = 0;
    for (int i = 0; i < s->len; ++i)
        if (s->data[i] != what)
            newarr[j++] = s->data[i];
    free(s->data);
    s->data = newarr;
    s->len--;         /* number of elements has decreased by one */
}

/* lists the content of the set */
void set_print(Set const *s) {
    for (int i = 0; i < s->len; ++i)
        printf("%g ", s->data[i]);
    printf("\n");
}

int main(void) {
    Set s;

    set_init(&s);
    set_insert(&s, 3.14);
    set_insert(&s, 2.0);
    set_insert(&s, 3.14);
    set_print(&s);
    set_insert(&s, 6.1);
    set_print(&s);
    set_remove(&s, 3.14);
    set_print(&s);
    set_destroy(&s);

    return 0;
}

In the solution above we have exploited that free(NULL) is accepted by the standard. For some older C compilers this can be a problem, though. In this case, writing if (ptr!=NULL) free(ptr); solves the problem.

After the main() function has put number 6.1 to the set, the memory layout is as follows:

Set s is in the stack, consisting of the length and the pointer, and the pointer points to a dynamically allocated array, given by the malloc() call.

4. Set operations with dynamically allocated structures

Let us modify the previous solution such that the initializer function returns a new, dynamically created set, instead of initializing an existing one. After this modification using the set should be as simple as follows:

Set *s;

s = set_new();
set_insert(s, 3.14);
set_insert(s, 6.1);
set_print(s);
set_remove(s, 3.14);
set_print(s);
set_destroy(s);

This way the sets can be passed freely among the functions, they wont be deleted since they are not local variables any more. Observe how benefitial the modification is to the syntax: since s is always a pointer, we dont need to use the address-of operator & any more.

Solution

This approach leads to the following memory layout

The local variable in the main() is only a pointer, both the set structure and the data array are located in the heap, since they were allocated dynamically.

The difference is only in the initializing and the destroying functions. When initializing, the structure has to be created by malloc(), too, consequently, when destroying, the structure has to be released, too:

Set *set_init() {
    Set *ps;
    ps = (Set*) malloc(sizeof(Set));
    ps->data = NULL;
    ps->len = 0;
    return ps;
}

void set_destroy(Set *ps) {
    free(ps->data);
    free(ps);
}

HW:. Less than the average

Let us write a function that receives a double array and returns a new, dynamically allocated array with the elements smaller than the average! The function should return the array and its size through the parameter list, as pointers. The return value itself should be true if the operation was successful and false if not.

Write a program to demonstrate how to use the function!

Solution

The principle of the solution: we need to allocate a double array to store the less-than-average elements. But what should be the size of the array? We need to count it before allocating the array, like in the previous exercise. Hence, the steps are:

calculating the average,
counting how many elements are below the average,
allocating the target array (we know the capacity of this array from the previous step),
copy the elements to the target array.

Let us focus on the syntactical elements! The double **newarray seems horrific at the first sight, but it is actually not something new: the target array (of type double*) is returned through the parameter list, as a pointer, leading to double**.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

int separate(double *source, int srclen, double **target, int *trglen) {
    /* calculating the average */
    double sum = 0;
    for (int i = 0; i < srclen; ++i)
        sum += source[i];
    double average = sum/srclen; 

    /* determining the length of the target array */
    int cnt = 0;
    for (int i = 0; i < srclen; ++i)
        if (source[i] < average)
            ++cnt;            

    double *newarr = (double *) malloc(sizeof(double)*cnt);
    if (newarr == NULL)
        return 0;

    int index = 0;
    for (int i = 0; i < srclen; ++i)
        if (source[i] < average) {
            newarr[index] = source[i];
            ++index;
        }

    *target = newarr;
    *trglen = cnt;
    return 1;
}


int main(void) {
    double numbers[5] = {3.4, 8.5, 4.6, 9.1, 1.2};

    double *narr;
    int nlen;
    separate(numbers, 5, &narr, &nlen);
    for (int i = 0; i < nlen; ++i)
        printf("%f ", narr[i]);
    free(narr);

    return 0;
}