This would not happen with Java

By | 11th April 2021

One of the main differences between C and Java is the capability of C to perform low-level memory operations. This allows for great power and flexibility but is also a constant source of bugs. In that sense, Java is considered a safer programming language.

This post presents some examples in C that would not happen with Java.

The following examples have been compiled and run in a Linux Virtual Machine:

fjab@fjab-VirtualBox:~/newc$ uname -sr
Linux 5.8.0-44-generic

fjab@fjab-VirtualBox:~/newc$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Variable overwriting in memory

The following snippet defines two global variables var1 and var2

int var1 = 1;
int var2 = 2;

int main(int argc, char const *argv[])
{
    return 0;
}

Initialised global variables are stored in the data segment. The corresponding little-endian hexadecimal representation of the 4-byte var1 and var2 is 0x01000000 and 0x02000000, respectively. As can be seen, these variables are stored in the consecutive memory addresses 0x4010 and 0x4014

fjab@fjab-VirtualBox:~/newc$ gcc overwrite_var.c -o overwrite_var
fjab@fjab-VirtualBox:~/newc$ objdump -s -j .data overwrite_var

overwrite_var:     file format elf64-x86-64

Contents of section .data:
 4000 00000000 00000000 08400000 00000000  .........@......
 4010 01000000 02000000                    ........        

With this knowledge, it’s easy to get a pointer to var1 and use it to overwrite the content of the memory address corresponding to var2

#include <stdio.h>
int var1 = 1;
int var2 = 2;

int main(int argc, char const *argv[])
{
    printf("var1=%d\n", var1);
    printf("var2=%d\n", var2);
    int *ptr = &var1;
    *(ptr+=1) = 3;
    printf("var2=%d\n", var2);

    return 0;
}
fjab@fjab-VirtualBox:~/newc$ ./overwrite_var 
var1=1
var2=2
var2=3

Big variables in the stack

By default, local variables in C are defined in the stack segment. The following example, defines an array of 3 million integers:

#include <stdio.h>
#define SIZE 3000000


int main(int argc, char const *argv[])
{
    int arr[SIZE] = {1};
    long sum = 0;
    for (size_t i = 0; i < SIZE; i++)
    {
        sum += arr[i];
    }
    
    printf("sum=%ld\n", sum);
    return 0;
}

When running the program, a segmentation error happens as the memory available in the stack segment is not enough to allocate the array: 3,000,000 * 4 = 12,000Kb > 8,192Kb

fjab@fjab-VirtualBox:~/newc$ gcc big_stack.c -o big_stack
fjab@fjab-VirtualBox:~/newc$ ./big_stack 
Segmentation fault (core dumped)
fjab@fjab-VirtualBox:~/newc$ ulimit -s
8192

This would not happen with Java as all objects, and in particular the arrays, are stored in the heap.

Big variables, big files

So what if the array is defined as a global variable instead?

#include <stdio.h>
#define SIZE 3000000

int arr[SIZE] = {1};

int main(int argc, char const *argv[])
{
    long sum = 0;
    for (size_t i = 0; i < SIZE; i++)
    {
        sum += arr[i];
    }
    
    printf("sum=%ld\n", sum);
    return 0;
}

This time the application runs successfully but something else happens: the size of the executable has grown to 12Mb! The reason being that the content of the data segment is stored in the executable object

fjab@fjab-VirtualBox:~/newc$ gcc big_data.c -o big_data
fjab@fjab-VirtualBox:~/newc$ ./big_data 
sum=1
fjab@fjab-VirtualBox:~/newc$ ll big_data
-rwxrwxr-x 1 fjab fjab 12016744 Mar 25 22:42 big_data*
fjab@fjab-VirtualBox:~/newc$ size big_data
   text	   data	    bss	    dec	    hex	filename
   1644	12000616	      8	12002268	 b723dc	big_data

This would not happen with Java either.

More big variables

We can try something else: declaring the array as a global variable without initialisation.

#include <stdio.h>
#include <unistd.h>
#define SIZE 3000000

int arr[SIZE];

int main(int argc, char const *argv[])
{
    pid_t pid = getpid();
    printf("pid=%d\n", pid);

    long sum = 0;
    arr[0] = 1;
    for (size_t i = 0; i < SIZE; i++)
    {
        sum += arr[i];
    }
    
    printf("sum=%ld\n", sum);
    return 0;
}

In this case, the variable is stored in the bss segment but the size of the file does not increase. This happens because it is not necessary to allocate space for the uninitialised data in the executable but just record the size required.

fjab@fjab-VirtualBox:~/newc$ gcc big_bss.c -o big_bss
fjab@fjab-VirtualBox:~/newc$ ./big_bss 
sum=1
fjab@fjab-VirtualBox:~/newc$ ll big_bss
-rwxrwxr-x 1 fjab fjab 16728 Mar 25 22:51 big_bss*
fjab@fjab-VirtualBox:~/newc$ size big_bss
   text	   data	    bss	    dec	    hex	filename
   1660	    600	12000032	12002292	 b723f4	big_bss

In theory, the size of the array will materialise when running the program. To test this statement, let’s run the program and attach a debugger (gdb) to stop the application and examine the memory of the process

fjab@fjab-VirtualBox:~/newc$ gcc -g big_bss.c -o big_bss
fjab@fjab-VirtualBox:~/newc$ gdb big_bss 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from big_bss...
(gdb) b main
Breakpoint 1 at 0x1169: file big_bss.c, line 8.
(gdb) r
Starting program: /home/fjab/newc/big_bss 

Breakpoint 1, main (argc=21845, argv=0x7ffff7fb4fc8 <__exit_funcs_lock>)
    at big_bss.c:8
8	{
(gdb) n
9	    pid_t pid = getpid();
(gdb) 
10	    printf("pid=%d\n", pid);
(gdb) 
pid=7032
12	    long sum = 0;
(gdb) 

The program prints out the pid of the process, in this case 7032, that can be used to get the total amount of memory of the process, ~14Mb, that is in line with the expected 12Mb corresponding to the array.

fjab@fjab-VirtualBox:~/newc$ sudo pmap 7032 | tail -n 1
 total            14216K

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.