Table of Contents
- Related data items can be grouped together in a data structure that allows for the storage of these items in memory to be contiguous. This data structure is called an array and each item is described as an element. This makes the array an aggregate of elements.
- Elements in the array usually belong to the same data type, though there are arrays called mixed-type arrays that contain elements of different types. Normally, the array stores values of the same type, while elements of different types can be stored together in a unique aggregate called a data structure (or tuple).
- In data hierarchy, data structure and arrays are classified as records.
- The fundamental principal of the array is that its elements are stored in contiguous memory locations in the memory space. Therefore, the array can be defined as a contiguous space in memory whose registers store elements of the same type that share the same name (or array identifier).
- An array is considered a static data structure because its size remains unchanged during program execution.
Array Declaration
- The array needs to be named. This name serves as its identifier.
- The rules and convention governing the naming of variables and functions apply to the naming of arrays.
- The data types of the elements in the array need to be specified. This type is written to the left of the array identifier.
- The number of elements in the array need to be specified. This number is enclosed in square brackets [ ] which is placed right next to the identifier. Specifying the number of elements allows for appropriate memory allocation before the program is executed.
- Like the variable, the semicolon ; serves as the termination control character, and is thus is placed to the right of the closing square bracket.
- The above explanation allows for array declaration to be syntactically written as:
data_type name_of_array[number_of_elements];
- The elements of the array constitute the field of the array.
- The field width of an array is the maximum number of elements that the array can hold. It determines the size of the array, and this width does not change during program execution.
- Declaring an array is also described as defining an array.
Element Declaration
- The element in the array is defined by the name of the array and its position in this array. This position is written inside square brackets [], thus the syntax of the element definition is:
name_of_array [position_of_element_in_array]
- The first element in any array is assigned the position number zero(0). This position number is the position of the element in the array and is termed as the index or subscript of the element. As expected, this subscript is an integer value that cannot have a negative value.
- The element has a value in the array. By knowing the value of the element, the element can be fully defined using the following syntax:
name_of_array [position_of_element_in_array] = element_value
Array Initialization
- An uninitialized array has garbage values as the values of its elements.
- Each element in the array has a positional integer value whose data type is defined in the C Standard as an unsigned integral type, and whose keyword is size_t.
- The size_t type represents the size of elements in bytes, thus ensuring that it can hold a very large number of elements. This makes this type suitable for both array indexing and loop counting.
- The size_t data type is defined in the <stddef.h> which is normally included in the <stdio.h>. This type is also defined in the <stdlib.h>, <time.h>, <string.h>, and <wchar.h>. The conversion specification for displaying size_t values is %zu.
- The elements in the array can be initialized using values contained in a comma separated list that is enclosed in curly braces {}. This makes the syntax of this initialization as:
array_data_type name_of_array [number_of_elements] = {initializer_value_1, initializer_value_2, initializer_value_3};
- The comma-separated list enclosed in curly braces is called array initializers. Each value in this list is called an element initializer.
- The number of values in the array initializers must never exceed the number of elements of the array. If it exceeds, then a compilation error occurs. To prevent this, it is recommended that the number of elements in the array should not be specified so that the compiler calculates that number based on the number of values in the array initializers.
- If the number of values in the array initializers is less than the number of elements of the array, then the elements lacking initialization values are automatically assigned an initialization value of 0. This also allows for only one initialization value to be provided as the array initializer, and all remaining elements are initialized to zero:
array_data_type name_of_array [number_of_elements] = {initializer_value_1}; // All the elements except the first element are initialized to zero.
- The array can be represented in tabular format as follows:
Array Name | Index or Subscript | ||||
Array[1] //This is the definition of the first element | array[2] | array[3] | array[4] | array[5] | |
Array //This is the name of the array | Value_1 // This is the value of the first element | value_2 | value_3 | value_4 | value_5 |
- Another way to specify the number of elements in the array is to use the symbolic constant.
- In C programming, the symbolic constant is simply the name of a constant.
- A constant is any value in the program which does not change during execution i.e it is a fixed value. This fixed value is also called a literal. The data type of the constant can be an integer, character, string, or floating value or any applicable type. The data type aids in identifying the constant as an integer literal, character literal, or string literal etc.
- The symbolic constant is thus the name or label of the literal. Because the literal has a value, this value must be specified when declaring the symbolic constant.
- In C language, the symbolic constant must be defined using the #define preprocessor directive with the name written in uppercase and the value written after the name as follows:
#define SYMBOLIC_CONSTANT VALUE
- Preprocessor directives do not have a termination control character, and thus appending a semicolon next to the VALUE changes the value to VALUE; (i.e the value now contains a semicolon character). This can result in either a logic error or compilation syntax error.
- The syntax above allows the symbolic constant to be used as an identifier in the array as follows:
array_data_type array_name[SYMBOLIC_CONSTANT] = {array_initializers};
- This shows that the symbolic constant serves as the placeholder of the field width in the array syntax, and is thus described as replacement text for a literal because the compiler will replace the constant with the field width during compilation and program execution. Its utility is that it allows the programmer to only change the literal’s fixed value once and this change will be transmitted throughout the program by the symbolic constant. In the array, the value of the symbolic constant is the maximum number of elements that the array can hold.
- Assigning a value to the symbolic constant in executable statements or expressions results in compilation error because this constant is not a variable.
- When used in a function, the array is usually associated with a loop statement.
Character Array
- The elements in the array are normally of the same type. This allows for identification of an array as an integer array if it contains whole numbers and character array if it contains characters i.e letters and special characters.
- In the character array, each character is an element. A sequential list of characters is called a string. So, what differentiates a string from a character array.
- The character array is made up of two parts – a string and a string-terminating null character.
- The string-terminating null character marks the end of the string is an escape sequence that marks the end of a string. This escape sequence is made up of the escape character and the digit 0 and is thus written as \0. This means that the null character is represented as \0.
- The number of elements in the array is thus the number of characters in the string plus 1 element which is the null character. This reveals that \0 is a single element in the array, which differentiates it from all the other elements which are made up of only a single character. For this reason, the escape character plus the 0 digit form a unique compound character called the null character. Because a character is stored as a byte in memory, this null character is called the null byte. The null byte is a constant that appears in all character arrays.
- A string literal is used to initialize a character array, and the initialization syntax is:
char array_name[] = “string_literal”;
- The string literal is enclosed in double quotation marks.
- The syntax of the character array can also be written as:
char array_name [] = {‘L’ ,‘i’ ,‘t’ ,‘e’ ,‘r’ ,‘a’ ,‘l’ ,’\0’}; // This initializer list is a comma-separated list of characters.
- Each character is enclosed in single quotation marks.
- The use of quotation marks differentiates the character from the string. A character is enclosed in single quotation marks while a string is enclosed in double quotation marks.
- Each individual character in the array is assigned its own array subscript/index.
- When requesting the user to input elements into the array, there is need to specify the maximum number of elements that the array can hold and this is done by inserting the precision value into the conversion specification after the % character e.g %7s means that the array can only accept up-to 7 characters. Notably, the array needs to have the null character and thus the maximum number of elements that it can accept should be set as the precision value plus one (for the null character). Therefore, if %7s is the conversion specification, then the array should be defined as array [8];. This can be represented as follows:
field width = precision value + 1
- The scanf function allows the user to input elements into a character array, and the syntax used is:
scanf (“%precision_s”, array_name);
e.g scanf (“%7s”, array); // This allows for the string Synowl to be stored in the character array named array.
- The address operator is not used when scanf function reads the input into the array. This is because the address operator only works with variables, and is used to read the values into the memory space assigned to the variable.
- The scanf function cannot check the size of the array, and if the user inputs more characters than the array can hold, an error called buffer overflow occurs.
- The memory space assigned to the scanf function to store the values of the array is called a data buffer. It is called a buffer because it is the temporary storage of input data, and it holds this data until it is sent to another memory space or output device.
- This buffer has boundaries that define its range of memory locations.
- Buffer overflow is a security vulnerability because the characters are written in memory locations outside the array’s memory space. It is also called buffer overrun.
- Buffer overflow can be mitigated by setting the precision value in the format control string. The precision value allows for checking the user input because it defines the maximum number of characters that can be read by the function and then transferred to the array. This evaluation of the user input to ensure that it falls within the specified size of the array is called bound checking. If more characters than the number allowed by the precision value are entered by the user, this user input will be truncated to maximum number of characters set by the precision value and the rest of the user input data will be lost.
- Without using the format control string, then the buffer overflow exposes the program to a format string attack. In this attack, unevaluated input data can be executed as a code so as to read data in memory, create segmentation faults, and trigger software crash.
- To output a character array, the printf function is used with %s conversion specification included in its output statement. For example:
printf (“%s”, array); // This allows for the array to print out its elements as a string.
- The printf function terminates the output when it encounters the null character. Thus, like the scanf function, this function lacks inbuilt bound checking.
Automatic and Static Array
- An array can be declared inside a function, and this array is known as a local array. The scope of the local array is the block scope.
- The syntax of the local array is:
return_type function_identifier (parameter_type)
{
static array_type array_name [number_of_elements];
}
- By default, the local array has an automatic storage duration which means that it is destroyed every time program control exits the function, and it thus needs to be created again when the function is called.
- Like the variable, an array can be assigned a static storage duration. This array becomes a static local array.
- The static local array is initialized once when transfer of control is given to its function, and it exists throughout the duration of the program even if program control is transferred to other functions. This ensures that the elements in the array are not destroyed when the function exits, thus eliminating the need to recreate the array every time its function is called. Like the static variable, the keyword static is used to define a static array, and the elements in this array are initialized to zero by default.
- The local static array preserves the values of its elements until those values are changed during function execution, and new values are stored in the array. Therefore, the local static array allows for persistence of values obtained during a previous function call or when the array was initialized.
Pass Array to a Function
- The identifier (i.e name) of an array evaluates to the memory address of the first elements in that array. This allows for the array to be passed to a function.
- The address in memory where a value, e.g variable or element value, is stored can be printed using the %p conversion specification. This address has a hexadecimal value.
- A Hexadecimal value is a hexadecimal number. A hexadecimal number is a Base 16 number whose range is made up of digits 0-9 and the letters A-F (which represents the decimal numbers 10-15). Thus, the decimal value 10 is A in hexadecimal system.
- As an address in memory, the hexadecimal value is prefixed by the two-character string 0x, but this is dependent on the compiler used.
- Pass by reference is the method used to pass an array to a function. This means that the function can change the value of the elements in the array.
- The syntax for passing an array by reference to the header of a function is:
return_data_type function_name (array_data_type array_name[], size_t number_of_elements)
- The data contained in an element is called a scalar. This means that any integer, float, or character value in an element of an array is a scalar.
- To pass a scalar to a function, the array index is passed to that function. This method of passing an element to a function is called pass by value.
- To prevent a function from modifying the scalar of an element, the element should be declared using the type qualifier called constant, and whose keyword is const. It is declared as follows:
return_data_type function_name (const array_data_type array_name[], size_t field_width)
Searching an Array
- An array can contain a value that is of interest. This value is called the key value.
- The process of locating the key value in the array is called searching.
- During searching, the value used to search the array for the key value is called the search key. It is this search key that each scalar in the array is compared with during linear search.
- There are 2 basic techniques used for searching an array: the linear search and the binary search.
Linear Search
- In linear search, each scalar in the array is compared with the search key so as to locate the key value.
- Linear search is suitable for an unsorted array.
- To perform linear search, the search key must be declared as a variable and initialized. The printf and scanf functions can be used to capture this value from the user and then pass it to the program. This allows for the searck key to be compared to each scalar in the array. This comparison is facilitated by the loop function of C language. The boilerplate code for performing linear search is provided below:
int searchKey = 0; //This initializes the search key for an integer value.
For (size_t e = 0; e <= SIZE; ++e)// e is the array index, and this for statement ensures that the search key is compared to each scalar in the array
{
if(array_name[e] == searchKey)
{
printf(“The search key was located at element %d\n”, e);
}
}
- Linear search is slower compared to binary search, but its algorithm and code are simpler.
Binary Search
- Binary search is suitable for a sorted array. This means that an unsorted array must be sorted first before performing a binary search.
- The first step in binary search is to locate the median (or middle value or middle element) of the array which is used to divide the array into 2 parts – the lower half (whose values are less than the median) and the upper half (whose values are more than the median). The upper half and lower half of the array are the 2 subarrays obtained from the array.
- The search key is compared to the median and if it is larger than the median, then it searches the upper subarray. If the search key is less than the median, then it searches the lower subarray.
- In the selected subarray, it is divided into 2 sub-subarrays by its median. This median is then compared to the search key, and the appropriate sub-subarray is selected for searching.
- This sub-subarray is then divided into 2 by its median and the median then compared to the search key to determine which part to search.
- This division continues until the search key is equal to the median or the remaining single value; or the single value that remains is not equal to the search key which means that its value is not found in the array.
- The action of dividing the array into 2 reduces the duration and resources needed to perform the search. It also allows for the number of comparisons to be identified using the following simple equation:
2n = e; where n is the number of comparisons and e is the total number of elements in the array.
- If the array has 1,048,576 elements, then binary search performs a potential maximum of 20 comparisons while linear search performs a potential maximum of 1,048,576 comparisons. This highlights the performance dividend of binary search over linear search.
- Binary search uses iteration during execution.
CODE FOR BINARY EXECUTION HERE
Multidimensional Array
- The elements considered above have only a single subscript. These elements belong an array known as a single-dimension or unidimensional array because it is made up of only a single row of elements.
- An element can have multiple subscripts if its array is made up of multiple rows. An array made up of multiple rows of elements is called a multidimensional array.
- The multiple rows in the multidimensional arrays create columns, with the number of columns being equal to the number of elements in a row.
- A column is made up of elements in multiple rows which share the same row subscript (i.e positional value). For this reason, the first column in the multidimensional array is assigned the subscript (or index) 0 because all its elements have the subscript 0 in their respective rows.
- To differentiate the rows from each other, each row is assigned a row index, with the first row being assigned the index 0.
- The element is assigned its subscripts as follows in the multidimensional array:
array_name [row_index] [column_index]
- An array whose elements have two subscripts is called a two-dimensional array. The naming of elements in the two-dimensional array preserves the subscript value of each element per row but adds the row index to its left e.g if e[3] is the subscript in a single-row array, and this array is duplicated 3 times to create 3 rows which are then joined together to form a 3 row array, then the e[3] value in each row will be the same and they will share the same column index i.e e[0][3] in the first row, e[1][3] in the second row, and e[2][3] in the third row.
- Declaring the element as array_name [row_index, column_index] is a logic error as the brackets hold a comma-separated list of values that are transformed into an expression by the comma operator. This operator ensures that the list of values in the comma-separated list are evaluated from left-to-right, and the value of the list is set as the rightmost expression in this list.
- A multidimensional array can have more than 2 subscript values.
- The two-dimensional array is a table because it has rows and columns, with the simplest table being a single column table of only two rows.
- A two-dimensional array is defined by the number of rows and columns that it has. Thus, the simplest table is called a 2-by-1 two dimensional array. Thus, a two-dimensional array is described as an m-by-n array with m representing the number of rows and n representing the number of columns.
- To initialize a two dimensional array as a double subscripted array, the syntax used is:
array_name [number_of_rows] [number_of_columns] = { {comma_separated_values_for elements_in_row0} , {comma_separated_values_for elements_in_row1} }
- The array initializers of each row are enclosed in curly braces to create a sublist. Sublists are then separated from each other by a comma. Then the sublists are all enclosed together in curly brackets.
- The values in the sublists can be less than the number of columns because the elements that have not been initialized will be automatically assigned the initialization value of 0.
- The best way to initialize an array is to use a custom function which is then called to create the array. This function can be written as follows:
void function_array_table (array_name [number_of_rows] [number_of_columns] ) // number of rows is NOT mandatory, but the number of columns is MANDATORY because it defines the maximum number of elements per row.
{
for (size_t r = 0; r <= number_of_rows; ++r ) // r represents row
{
for (size_t c = 0; c <= number_of_columns; ++c ) //c represents column
{
printf(“%d ”, array_name [r] [c] );
}
puts(“”) // Create a line space after the array
}
- The above function can then be called to print an array named table using the following syntax:
function_array_table (table);
- In memory, the rows are stored contiguously so that there values form a contiguous chain of values. It is for this reason that the number of columns in the array must be declared because it allows the compiler to know how many skips to make per row in order to access the subsequent row.
- Manipulation of values in the multidimensional array frequently uses the iteration statement. For example, to set all values in a row to zero, the following code is used:
for (size_t c = 0; c <= number_of_columns; ++c ) //c represents column
{
array_name [selected_row_index] [c] = 0;
}
- To sum the elements in the array, the following code is used:
int sum = 0; //A new variable called sum is declared and initialized to 0.
for (size_t r = 0; r <= number_of_rows; ++r ) // r represents row. This allows for the elements in the first row to be summed before the second row is processed.
{
for (size_t c = 0; c <= number_of_columns; ++c ) //c represents column
{
sum += array_name [r] [c] ;
}
}
Variable Length Array (VLA)
- An array whose number of rows and columns has not been defined during program startup is called a variable length array.
- The sizes of the rows and columns of the VLA are determined at execution time. For this reason, it is also called a runtime sized array.
- Variables are used to set the number of rows and columns. These variables can take values from the users, and then use this input values to set the number of rows and columns. For this reason, it is also called a variable sized array.
- The operator used to check the size of the VLA is the sizeof operator.
- In the VLA, the size of the row (and the size of the column) is declared by a variable before the array is declared, and this allows for the variables to be passed to the array declaration.
- Not all compilers support a VLA.