代做Data structure | 代写assembly | C代写 | 代写assignment – Programming Assignment 3 (PA3) – Querygram

Programming Assignment 3 (PA3) – Querygram

代做Data structure | 代写assembly | C代写 |  代写assignment – 这是利用C语言进行训练的代写, 对Data structure的流程进行训练解析, 涵盖了Data structure/assembly等程序代做方面, 这个项目是assignment代写的代写题目

ass代做 assignment代写 代写assignment

 assignment Overview Getting Started Walkthrough and Example Input Detailed Overview Milestone Functions
Post-Milestone Functions Unit Testing README File Extra Credit Turnin Summary

Assignment Overview

The purpose of this programming assignment is to gain familiarity with structs, pointers, dynamic memory allocation, resizing arrays, string operations, sorting, searching, and reading/writing text/binary files. We will be working with dynamic memory allocation functions like calloc, realloc, and free; string functions like strncat, strncpy, strncmp, and strncasecmp; stdio functions like fopen, fread, fwrite, fclose; and stdlib functions like qsort and bsearch. We will get further practice with looping constructs, conditional constructs, function calls, and working with global variables.

Remember to read the man pages (RTFM–Read The Friendly Manual) in order to lookup specific C functions. For example, if you would like to know what parameters qsort() takes, type man -s3 qsort. Also, make good use of the tutors in the lab. They are there to help you learn more, point you to useful resources, and help you get through the course!

Make sure you start early! This assignment will likely take longer than the

previous assignments and will be more difficult. If you do not start early, you

will very likely not finish this assignment on time!

(many students have tried testing this hypothesis in the past--please learn from their mistakes!)

Grading

README: 10 points – See README Requirements here and questions below http://cseweb.ucsd.edu/~ricko/CSE30READMEGuidelines.pdf Compiling: 5 points – Using our Makefile; no warnings. If what you turn in does not compile with the given Makefile, you will receive 0 points for this assignment. NO EXCEPTIONS! Style: 10 points – See Style Requirements here http://cseweb.ucsd.edu/~ricko/CSE30StyleGuidelines.pdf Correctness: 75 points Milestone (20 points) – To be distributed across the Milestone functions (see below) Make sure you have all files tracked in Git. Extra Credit: 4 points – View Extra Credit section for more information. Wrong Language: You will lose 10 points for each module in the wrong language, C vs. assembly or vice versa. NOTE: If what you turn in does not compile with given Makefile, you will receive 0 points for this assignment.

Getting Started

Follow these steps to acquire the starter files and prepare your Git repository.

Gathering Starter Files: The first step is to gather all the appropriate files for this assignment. Connect to pi-cluster via ssh. $ ssh [email protected]

Create and enter the pa3 working directory. $ mkdir ~/pa $ cd ~/pa

Copy the starter files from the public directory. $ cp ~/../public/pa3StarterFiles/* ~/pa3/

Starter files provided:
pa3.h
test.h
pa3Strings.h
testanagramCmp.c
pa3Globals.c
Makefile

Important Notes About Starter Files: Do NOT edit pa3Globals.c. Do NOT #include pa3Globals.c in any of your files. You should never #include a C file, as this copies the contents of that C file into whatever file you are #including it in. This will most definitely cause compilation errors.

Preparing Git Repository: You are required to use Git with this and all future programming assignments. Refer to the PA0 writeup for how to set up your local git repository.

Walkthrough and Example Input

Anagrams are words which can be spelled by rearranging the letters of one another. For example, “three”, “there”, and “ether” are anagrams. For PA3, we will consider only single-word anagrams containing only the letters A-Z, and we will be case insensitive (“tHRee” and “ether” are still anagrams).

In PA3 you will be writing a pair of executables that work together to give you the ability to find anagrams in any set of words. These two executables are called generate and query.

The generate executable will take a set of words from a file where each word is on a separate line. It does some preprocessing on each word to make the anagram search easier (finding the number of occurrences of letters in each word, skipping words that contain punctuation, etc). Then it outputs the words along with this extra metadata into an anagram output (.ano) file.

The query executable uses the .ano files generated by generate to search for anagrams. It takes in the metadata from the .ano file and builds a Data structure (a hashtable) to make searches more efficient. This executable will then prompt the user for words whose anagrams the user wants to look for, and prints out those anagrams.

The generate executable Here are the arguments that generate supports. $ ./generate [-i inputWordsFile] [-o outputAnagramFile] [-h] Here the -i flag lets us provide an input file containing all the words we want to be able to search for, and the -o flag is the name of the output file. The -h flag tells the executable to print a usage statement to describe these arguments. The square brackets around the arguments indicate that the argument is optional. This means you could execute generate in a number of different ways: $ ./generate $ ./generate -h $ ./generate -i wordsFile $ ./generate -o test.ano $ ./generate -i wordsFile -o test.ano

If the flag -i or the flag -o is not given, generate will use default names for the input and output files. These names are /usr/share/dict/words and anagrams.ano, respectively. Note that /usr/share/dict/words contains words with non-alphabetic characters. Those words should be ignored when building the anagram file.

The query executable Here are the arguments that query supports. $ ./query [-s tableSize] [-h] -f anagramFile Here the -s flag lets us define the size of the hashtable which query should use in its searching algorithm. The

  • f flag lets the user specify the .ano file from which query should read. The -h flag also prints a usage statement as in the generate executable. Notice that the -s and -h flags are optional here, but the -f flag is not. This means that if we want to search for anagrams, we need to at least supply the -f flag. The only time it can be left out is if you want to use the -h flag: $ ./query -f test.ano $ ./query -s 10 -f test.ano $ ./query -h

Example Walkthrough Let’s run through a simple flow of how to use generate and query. Let’s say we have a file with some words in it, called myWords:

File myWords

three
ear
are
not
THERE
...why?

We can generate the .ano file for this by using generate (bolded text is typed user input): $ ./generate -i myWords -o myAnagrams.ano

Generated anagram file (myAnagrams.ano) with 5 anagrams.

Here generate tells us that it found 5 words, converted them into anagram data, and stored that data into the file myAnagrams.ano. Notice that the “…why?” was not included as an anagram because it contained non- alphabetic characters.

If we’re curious, we can take a peek at the output file: $ hexdump -C myAnagrams.ano 00000000 74 68 72 65 65 00 00 00 00 00 00 00 00 00 00 00 |three………..| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 |…………….| 00000030 00 01 00 00 00 00 00 00 00 00 00 01 00 01 00 00 |…………….| 00000040 00 00 00 00 88 f6 17 35 65 61 72 00 00 00 00 00 |…….5ear…..| 00 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….|

00000070 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 |…………….| 00000080 00 00 00 01 00 00 00 00 00 00 00 00 06 f2 da ad |…………….| 00000090 61 72 65 00 00 00 00 00 00 00 00 00 00 00 00 00 |are………….| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….| 000000b0 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 |…………….| 000000c0 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |…………….| 000000d0 00 00 00 00 06 f2 da ad 6e 6f 74 00 00 00 00 00 |……..not…..| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….|

00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |…………….| 00000110 01 00 00 00 00 01 00 00 00 00 00 00 68 5c 52 bc |…………h\R.| 00000120 54 48 45 52 45 00 00 00 00 00 00 00 00 00 00 00 |THERE………..| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 |…………….| 00000150 00 01 00 00 00 00 00 00 00 00 00 01 00 01 00 00 |…………….| 00000160 00 00 00 00 88 f6 17 35 |…….5| 00000168

This file definitely contains more data than just the input file, but you should be able to see that the original words are stored in the .ano file (look on the right column for the ASCII representation of the file’s bytes).

Now we are ready to search. Load the .ano file using query: $ ./query -f myAnagrams.ano

Enter a word to search for anagrams [^D to exit]:

Note that since we don’t specify the -s flag, query will use the default table size. You can see query is now prompting us for a word so that it can search for that word’s anagrams. Let’s enter the word “ear”: Enter a word to search for anagrams [^D to exit]: ear

Anagrams found: are

Notice that query found the word “are” because “are” was in the myWords file from above. Also notice that query did not print out “ear” since it was the word we entered in the search query. If we type another word that is an anagram of both “ear” and “are”, query will show both in its results: Enter a word to search for anagrams [^D to exit]: era

Anagrams found: ear are

Notice that even though the queried word “era” did not exist in the myWords file, query will still print out words that are anagrams of “era”. This goes for any other word: Enter a word to search for anagrams [^D to exit]: ton

Anagrams found: not

Remember that anagrams are case insensitive, so the case of the queried word does not matter: Enter a word to search for anagrams [^D to exit]: eThEr

Anagrams found: three THERE

If there are no anagrams for a given word, query will say so: Enter a word to search for anagrams [^D to exit]: hello

No anagrams found.

Finally, you can exit this program by entering ^D (Ctrl+D): Enter a word to search for anagrams [^D to exit]: ^D

allWords and fewWords In the public directory (~/../public), we provide you two starter files called allWords and fewWords. The file allWords is very large, containing many (72,786) words. fewWords is shorter.

You may use these files to test and stress-test your programs. We recommend that you make some of your own files as well to do some preliminary tests (e.g. a simple file with one word) or to test any edge cases that might come up.

Warning: These files are large and may impact your disk quota if you copy them into your pa3 folder. We recommend you avoid doing so. Overstepping your disk quota can lead to the inability to save changes to your code, among other things.

Sample executables A pair of sample stripped executables provided for you to try and compare your output against is available in the public directory. Note that you cannot copy them to your own directory; you can only run them using the following commands (where you will also pass in the command line arguments): $ ~/../public/pa3generate $ ~/../public/pa3query

NOTE:
  1. The output of your programs MUST match exactly as it appears in the public executables’ output. You need to pay attention to everything in the output, from the order of the error messages to the small things like extra newlines at the end (or beginning, or middle, or everywhere)!
  2. We are not giving you any sample outputs, instead you are provided some example inputs. You are responsible for trying out all functionality of the program; the list of example inputs is not exhaustive or complete. It is important that you fully understand how the programs work and that you test your final solution thoroughly against the executables.

For the below inputs, assume that the file noExist does not exist, and the file noPerm has permissions set so the user cannot read the file (man -s1 chmod). Assume that the files allWords and anagrams.ano exist in the same directory as the generate and query executables.

Example input that has error output: cs30xyz@pi-cluster-001:pa3$ ./generate -i noExist cs30xyz@pi-cluster-001:pa3$ ./generate -i noPerm cs30xyz@pi-cluster-001:pa3$ ./generate – o anagrams cs30xyz@pi-cluster-001:pa3$ ./query cs30xyz@pi-cluster-001:pa3$ ./query -s 0x5 -f anagrams.ano

Example input that has normal output: cs30xyz@pi-cluster-001:pa3$ ./generate -h cs30xyz@pi-cluster-001:pa3$ ./generate cs30xyz@pi-cluster-001:pa3$ ./query -h cs30xyz@pi-cluster-001:pa3$ ./query -f anagrams.ano

Testing tips Since the query executable is an interactive program, making sure that your output matches may seem difficult to do since you need to type the words as query prompts. However, you can write all of the words you plan to query in a file and redirect this file to stdin to simulate typing words in for query. Let’s say you have a file:

File commands

era
ton
ether

Running query while redirecting the commands file to stdin looks like this: $ ./query -f myAnagrams.ano < commands

Enter a word to search for anagrams [^D to exit]: Anagrams found: ear are

Enter a word to search for anagrams [^D to exit]: Anagrams found: not

Enter a word to search for anagrams [^D to exit]: Anagrams found: three THERE

Enter a word to search for anagrams [^D to exit]:

Notice that we did not need to specify ^D anywhere in the commands file. This is because once we reach the end of the commands file, the end of file itself acts as ^D, signaling that there is no more keyboard input to read for query.

We can take advantage of this automated typing and redirect the output of this command to a file: $ ./query -f myAnagrams.ano < commands > output

Now we can diff this output with the public executable’s output for testing.

Detailed Overview

The function prototypes for the various C and Assembly functions are as follows.

C routines: void countChars( unsigned char counts[], const char * word ); int generateAnagramFile( const char * inFilename, const char * outFilename ); int main( int argc, char * argv[] ); // in generateAnagramMain.c int insertIntoBucket( bucket_t * bucket, anagram_t * anagram ); int loadQueryTable( const char * anagramFilename, queryTable_t * queryTable ); void executeQuery( queryTable_t * queryTable, anagram_t * needle ); void queryLoop( queryTable_t * queryTable ); int main( int argc, char * argv[] ); // in queryAnagramMain.c

Assembly routines: unsigned int getHashcode( unsigned char counts[] ); void assembleAnagram( const char * originalWord, anagram_t * anagram ); int hashcodeCmp( const void * p1, const void * p2 ); int countsCmp( const void * p1, const void * p2 ); int anagramCmp( const void * p1, const void * p2 ); unsigned int getURemainder( unsigned int dividend, unsigned int divisor ); unsigned int getBucketIndex( unsigned int hashcode, size_t tableSize ); int isInBounds( int value, int minBound, int maxBound );

For the Milestone, you will need to complete:
countChars.c
getHashcode.s
assembleAnagram.s
hashcodeCmp.s
countsCmp.s
anagramCmp.s
getURemainder.s
getBucketIndex.s

Process Overview:

Generate Anagrams
  1. Parse the command line arguments to determine the name of the input words file and output anagrams file.
  2. For every word in the words file, create an anagram struct from the word, and write the struct to the output anagram file.
Query Anagrams
1. Parse the command line arguments to determine
the name of the input anagram file and the size of
the query table to create.
2. Create the query table by reading in the anagram
structs from the anagram file.
3. Continuously prompt the user to enter a word to
query for anagrams, printing the results after each
word is entered.

Given an input text file of words (where the words are separated by newlines), the anagram file generated should look like the following:

The following diagram shows the various structs used in this assignment and their relationships:

Milestone Functions to be Written

Listed below are the modules to be written for the milestone.

countChars.c void countChars( unsigned char counts[], const char * word );

This function will count the occurrences of each character in the passed in word and store them in counts, an array of 26 unsigned characters that should just be thought of as regular bytes (counts is not a string).

Each byte in counts represents the number of times the letter at that index in the alphabet appears in the passed in word. The count for each letter should be non-case sensitive. This means that the byte at index 0 represents the number of times ‘a’ and ‘A’ appear in the word, the byte at index 1 represents the number of times ‘b’ and ‘B’ appear in the word, and so on (hint: man -s3 tolower).

To accomplish this, you should iterate over each character in the word, calculate that character’s corresponding index in counts (hint: man ascii), and increment the count at that index in the counts array.

Hints: What values are variables initialized to by default? How will that affect your counts array? What marks the end of a string in C?

Reasons for error:

 word is null  Immediately return, counts should be all 0s.
Return Value: None, but counts should be populated with the results.

getHashcode.s unsigned int getHashcode( unsigned char counts[] );

Generate a hashcode from the counts array. The purpose of a hashcode is to generate a somewhat-unique number that represents an object. To do so, translate the following C code into assembly. Use the global variables defined in pa3Globals.c to access constants in assembly.

unsigned int hash = HASH_START_VAL;

for ( int i = 0; i < ALPHABET_SIZE; i++ ) { hash = hash * HASH_PRIME + counts[i]; }

return hash;

Return Value: The hashcode that corresponds to the counts.

assembleAnagram.s void assembleAnagram( const char * originalWord, anagram_t * anagram );

Assemble an anagram_t struct given an originalWord and a pointer to the anagram you are assembling.

  1. Copy the originalWord passed in into the originalWord field in the anagram, making sure to null terminate the word (man -s3 strncpy). If the word is too long to fit in the originalWord array, you MUST null terminate it, which may end up truncating the word (this is fine).
  2. Assemble the counts field of the anagram using countChars().
  3. Assemble the hashcode field of the anagram using getHashcode().
Return Value: None, but anagram should be assembled based on originalWord.
Note on comparison functions:
In this assignment you will be using two of the C standard library array sorting and searching functions:
qsort() and bsearch(). Both of these functions take in a pointer to a comparison function (in order to
sort the array, you need to provide instructions on how to compare the elements of the array--this is the
purpose of the following 3 comparison functions). The comparison function must take in two const void
*s as parameters.
Your comparison functions should behave according to the following standard conventions:
 Return -1 if the first argument should be ordered earlier than the second
 Return 0 if the arguments are equal
 Return +1 if the first argument should be ordered later than the second
Refer to man -s3 qsort and man -s3 bsearch for more information.

hashcodeCmp.s int hashcodeCmp( const void * p1, const void * p2 );

This function takes two void pointers in its signature, but you will pass in pointers to anagrams (const anagram_t *) and should compare them using only the hashcodes.

Return Value: - 1 if the anagram that p1 points to has a smaller hashcode than the one p2 points to
+1 if the anagram that p1 points to has a larger hashcode than the one p2 points to
0 if they have the same hashcode

countsCmp.s int countsCmp( const void * p1, const void * p2 );

This function takes two void pointers in its signature, but you will pass in anagram pointers (const anagram_t *) and should compare them using only their counts using memcmp() (man -s3 memcmp). For this comparator function, just return the value that memcmp returns when comparing the two counts arrays.

Return Value: The return value of memcmp.

anagramCmp.s int anagramCmp( const void * p1, const void * p2 );

This function takes two void pointers in its signature, but you will pass in anagram pointers (const anagram_t *) and compare them. This comparator should first compare the anagrams hashcodes using

hashcodeCmp. If they have the same hashcodes, then, and only then, it should compare the letter counts as a tie breaker using countsCmp.

Return Value: - 1 if the anagram that p1 points to either:
 Has a smaller hashcode than does the anagram that p2 points to or
 Has the same hashcode but has a smaller countsCmp result than does the anagram
that p2 points to
+1 if the anagram that p1 points to either:
 Has a larger hashcode than does the anagram that p2 points to or
 Has the same hashcode but has a larger countsCmp result than does the anagram
that p2 points to
0 if the two anagram pointers point to equal anagrams (same hashcode and counts)

getURemainder.s unsigned int getURemainder( unsigned int dividend, unsigned int divisor );

Gets the unsigned remainder when dividend is divided by divisor. This will need to make use of unsigned division, not signed division.

Return Value: The unsigned remainder.

getBucketIndex.s unsigned int getBucketIndex( unsigned int hashcode, size_t tableSize );

Given an anagrams hashcode and the size of the hash table, return the index in the hash table where the anagram should index to. The index is hashcode % tableSize.

Hint: You should use another milestone function to do this.

Return Value: The index into the hash table of an anagram with the given hashcode.

Post-Milestone Functions to be Written

Listed below are the modules to be written after the milestone functions are complete.

isInBounds.s int isInBounds( int value, int minBound, int maxBound );

Copy over from PA1. No changes necessary.

generateAnagramFile.c int generateAnagramFile( const char * inFilename, const char * outFilename );

Read in one word at a time from the words file named inFilename, create an anagram struct for the word read in, and then write the anagram struct to the anagram file named outFilename. Repeat for all words in the words file.

  1. Open the input words file (man -s3 fopen). Youll need to set errno to 0 first for error checking later. Open inFilename with mode “r” since you are only reading from it (man -s3 fopen). If there was an issue opening the input file (again, man -s3 fopen) , youll have to print out an error message to stderr. Search for ENOENT and EACCES in the man page for errno (man -s3 errno) to determine what these error values mean. Print out STR_ERR_IN_FILE_PERMISSION if permission was denied, otherwise print out STR_ERR_IN_FILE_INVALID if the file doesn’t exist. Return -1 immediately.
  2. Open (and therefore create) the output file named outFilename with mode “wb”. If there was an issue opening the output file, youll have to print out STR_ERR_CREATE_ANAGRAM_FILE to stderr, close the input file (man -s3 fclose), and return -1 immediately.
  3. Before you start generating the anagram file, you will need to make a stack-allocated anagram_t to hold each anagram you assemble, as well as a char buffer of MAX_WORD_SIZE to store each word you read in from the input file.
  4. Read in each word, one word at a time, using fgets. If the newline character is included in the word you read in, replace it with the null termination character (hint: man -s3 strchr). If a non-alphabetic character is detected in the word you read in, skip that word and continue on to the next word. Otherwise, assemble an anagram with assembleAnagram using the word you read in. It might be helpful to make a helper function to check if a word contains non-alphabetic characters. You can use isalpha to check if a character is alphabetic or not (man -s isalpha). Write the anagram to the output file (man -s3 fwrite). If there is an error writing the anagram, print out STR_ERR_WRITE_ANAGRAM to stderr, close both the input and the output files (man -s3 fclose), and return -1 immediately. See the diagram in the Detailed Overview section for what the anagram file should look like. Keep count of the number of valid anagrams read (don’t count words that contain non- alphabetic characters).

Before you return, make sure to close both the input and the output files (man -s3 fclose). Return the number of anagrams read.

Reasons for error: Permission denied for reading input file: Print error message and return -1. Invalid input filename: Print error message and return -1. Error with creating output file: Print error message, close the input file, and return -1. Error with writing anagram to output file: Print error message, close both the files, and return -1.

Return Value: If an error occurred, return -1. If there were no errors, return the number of anagrams you
assembled. Words with non-alphabetic characters are thrown out so they dont count as
anagrams.

generateAnagramMain.c int main( int argc, char * argv[] );

This function will drive the main behavior of generating the anagrams from the input file. Feel free to make helper functions in this file!

  1. Use getopt() to parse the flags described in the long usage statement for generating anagrams. See the man page for a very helpful example of how to do this (man -s3 getopt). Make sure you use the correct type when storing the return value from getopt().
Flag Required Argument How to handle
  • i (^) Input filename Save the name of the input file for later use.
  • o (^) Output filename Save the name of the output file for later use.
  • h (^) None Print the STR_GENERATE_ANAGRAM_USAGE_LONG to stdout and return EXIT_SUCCESS. Note that “Required Argument” here means the argument required by the flag in the column to the left. It does not mean that the flag (and argument together) itself is required. If a flag is entered that is not one of the above flags, print out STR_GENERATE_ANAGRAM_USAGE_SHORT (referred to as “the short usage string” from now on) to stderr and return EXIT_FAILURE. If the input filename is not specified, the default should be IN_FILE_DEFAULT. If the output filename is not specified, the default should be OUT_FILE_DEFAULT. If there are any more args leftover after parsing (look up optind in man -s3 getopt) then print out STR_ERR_EXTRA_ARGS and the short usage string to stderr and return EXIT_FAILURE.
  1. After parsing the command line arguments, you must validate that the output filename ends in “.ano”. To do this, you can use strchr (man -s3 strchr) to search for the dot character (‘.’) in the filename. From there, you must verify that the file extension starting with the dot matches ANAGRAM_FILE_EXT (hint: man – s3 strncmp). If the filename does not end in “.ano”, print out STR_ERR_ANAGRAM_FILE_EXT and return EXIT_FAILURE.
  2. Generate the anagram file using generateAnagramFile. If there was an error in generateAnagramFile, return EXIT_FAILURE. Otherwise, print out the name of the generated anagram file along with the number of anagrams written to it using STR_ANAGRAM_FILE to stdout, and return EXIT_SUCCESS.

Reasons for error: When checking for errors, note that you should only print the short usage following an error message in the cases where the command line arguments themselves were wrong, and not in cases where the argument is correct, but the value entered is wrong. Unknown flag entered: Print short usage and return EXIT_FAILURE (getopt will print an error message for you) Too many command line arguments entered: Print error message, short usage, and return EXIT_FAILURE Outfile does not end in “.ano”: Print error message and return EXIT_FAILURE Error in generateAnagramFile: return EXIT_FAILURE (generateAnagramFile will print an error message for you)

Return Value: EXIT_FAILURE on error, EXIT_SUCCESS on success.

insertIntoBucket.c int insertIntoBucket( bucket_t * bucket, anagram_t * anagram );

This function will insert the passed in anagram to the passed in bucket, appending it to the end of the array of anagrams in bucket. This is one of the only two places where you will dynamically allocate memory.

  1. man -s3 realloc is your new best friend.
  2. Use realloc to increase the size of the bucket’s anagram array so that we can fit the new anagram at the end. For the purpose of error checking, realloc the array to a new pointer. If something goes wrong, free the actual dataPtr in bucket.
  3. If the memory allocation is successful, then you can reassign dataPtr to point to the new reallocated memory, then store the anagram at the end of the array now that there is room for it.

Reasons for error: If realloc fails, print STR_ERR_MEM to stderr, free the bucket’s dataPtr and return -1.

Return Value: - 1 if any error occurred, 0 otherwise.

loadQueryTable.c int loadQueryTable( const char * anagramFilename, queryTable_t * queryTable);

This function will read one anagram at a time from the anagram file named anagramFilename (generated from generateAnagramMain()), and set up a queryTable of anagrams.

  1. Open the file named anagramFilename using mode “rb”. This is a similar process to how you opened the other files in generateAnagramFile().
  2. Before you proceed to reading from the file, check the size of the file (man -s3 stat). You need to check if the file size is evenly divisible by the size of a single anagram. If not, close the input file you just opened, print STR_ERR_ANAGRAM_FILE_CORRUPT to stderr and return -1.
  3. Next, read in one anagram at a time using fread (man -s3 fread). Use getBucketIndex() to get the index of the bucket in the queryTable this anagram belongs to, and use insertIntoBucket() to insert the anagram into that bucket. If insertIntoBucket() failed, print STR_ERR_INSERT to stderr, close the file you opened, and return -1.
  4. After reading all the anagrams and inserting all of them into the queryTable, we need to sort each bucket’s anagram array in the table with qsort (man -s3 qsort). You will use anagramCmp() as the comparator function.
  5. Close the file you opened and return 0 at the end to indicate success.

Reasons for error: If opening anagramFilename fails, print appropriate error message and return -1. If the anagram file does not have the correct size, print error message and return -1. If insertIntoBucket fails, print error message and return -1.

Return Value: - 1 if any error occurred, 0 otherwise.

executeQuery.c void executeQuery( queryTable_t * queryTable, anagram_t * needle );

This function handles the actual query for the anagrams. Its job is to search for the needle in the proverbial anagram haystack (queryTable). Given the anagram needle to search for, this function prints any matching anagrams of that word, except for the word itself. A word is not an anagram of itself, regardless of the case (upper or lower).

Steps for finding and printing matched anagrams:

  1. Calculate the index into queryTable that the needle would be found at using getBucketIndex().
  2. Now search for anagrams in the sorted array of anagrams in the bucket at the index you calculated. This will help us find all the anagrams of the input word stored in needle: a. To search the array we will use bsearch (man -s3 bsearch). b. If bsearch() returns a null pointer, then no anagrams were found and we need to print STR_NO_ANAGRAMS_FOUND to stdout. There is no need to continue searching, return immediately. c. If bsearch() found an anagram, we need to find the first anagram_t that matches the needle passed in. Note that bsearch() does not guarantee that we will get the first match in a sequence of multiple matches. As a result, we need to step backwards in the array until we reach the first match (this is why we needed to sort the array). To do this, use anagramCmp() to determine if the current anagrams counts is the same as the anagram immediately before it in the array. If so, move the starting point of your search backwards one index in the array. Repeat until the current and previous anagrams counts do not match or until you reach the first element in the array. d. Once you hit the beginning of the sequence of matching anagrams, you need to traverse forward through the array of anagram_t structs and print out each anagram (the originalWord member of the struct) of the input needle. Do not print out any words that are the same as the originalWord in the input needle (more on this below). (Hint: it will be easier to build a string of space-separated anagrams in a buffer first before printing, rather than printing one word at a time. See man -s3 strncat). Continue doing this as long as the anagram following the current one is the same (anagramCmp()), being careful not to walk past the end of the array. e. Finally you will need to print the anagrams found: first printing STR_ANAGRAMS_FOUND, then the anagrams, then printing STR_NEWLINE_CHAR — all to stdout. Note: The printed anagram words should be case insensitive regarding the word you enter. For example, if the queried word is Santa and the word satan is in the dictionary, then satan should be printed out as an anagram of Santa , although they are not case-consistent. When preventing the same word that you entered from being printed, your comparison should also be case insensitive. For example, on query word Santa , if the words Santa , satan , and santa are in the dictionary, then you should only print out satan as an anagram, avoiding Santa and santa since they are the same as the query word. You might find (man -s3 strncasecmp) to be useful here. Walking past the beginning or end of the anagrams array is the most common mistake to make in this file. What check should you write to avoid walking past the beginning of the array of anagrams? What
check should you write to avoid walking past the end of the array? If you didn't write specific checks for
these two cases, you will very likely run into problems!

queryLoop.c void queryLoop( queryTable_t * queryTable );

This function handles the interactive mode of the program. We will continuously get the next input word from the user until the user enters Ctrl-D (^D) to signify the end of input (EOF).

An overview of the user-interactive process:

  1. Print STR_USER_PROMPT to stdout to prompt the user for input.
  2. Read the word entered by the user (man -s3 fgets) into a buffer (size can be BUFSIZ). Note that fgets() will include the newline character as part of the string read, so as soon as you get a new input, replace the newline character with the null character (man -s3 strchr).
  3. If the word entered by the user contains non-alphabetic characters (man -s3 isalpha), print STR_NO_ANAGRAMS_FOUND. We recommend creating a helper function to detect non-alphabetic characters in a string, like we recommended for generateAnagramFile.
  4. Otherwise, create a new anagram struct for the word entered by the user using assembleAnagram(), then find and print all matching anagrams in the queryTable passed in using executeQuery().
  5. Print the prompt again, and continue from step 2 as long as there is input to be read (man -s fgets).

Print STR_NEWLINE_CHAR to stdout at the end of this function. This is to keep the terminal’s prompt from printing on the same line as the last STR_USR_PROMPT.

queryAnagramMain.c int main( int argc, char * argv[] );

This is the main driver for the query program. Its main tasks are to parse the command line arguments and enter interactive mode to query anagrams. Feel free to make helper functions in this file!

Parse the command line arguments:

  1. We will use getopt() to parse the flags described in the long usage statement (STR_QUERY_USAGE_LONG). See the man page for a very helpful example of how to do this (man -s getopt). Make sure to use the constants for argument parsing defined in pa3.h. The implementation details for the flags are listed below.
Flag Required Argument How to handle
  • f (^) anagram filename Save the name of the anagram file for later use.
  • s (^) queryTable size Save the string of the queryTable size for later use.
  • h (^) None Print STR_QUERY_USAGE_LONG to stdout and return indicating success. Note that “Required Argument” here means the argument required by the flag in the column to the left. It does not necessarily mean that the flag (and argument together) itself is required.

In the case where getopt() detects an invalid flag or missing flag argument, it will automatically print an error message for you. If this happens, you also need to print the short usage (STR_QUERY_USAGE_SHORT) and return EXIT_FAILURE.

  1. If all flags are parsed in getopt() without error, make sure you also check for extra arguments. If there are extra arguments, print STR_ERR_EXTRA_ARGS and the short usage to stderr then return EXIT_FAILURE. Think about what will help you determine if there are extra arguments (hint: look at optind in the man page for getopt).
  2. Using the size string you saved during your getopt() loop, convert the queryTable size from a string in base 10 to an unsigned int. If no size flag is entered, use the default value for size (QUERY_TABLE_DEFAULT_SIZE in pa3.h).
  3. Use your isInBounds() function to make sure the table size is within 1 – 1000 inclusive (QUERY_TABLE_MIN_SIZE and QUERY_TABLE_MAX_SIZE in pa3.h). If the size is not in bounds, print STR_ERR_TABLE_SIZE_BOUNDS to stderr and return EXIT_FAILURE.
  4. Check that the filename argument (-f) was entered. You should be able to do this by checking if you encountered the -f flag in your getopt() loop. If no filename was given, print STR_ERR_FILENAME_NOT_ENTERED and the short usage to stderr, then return EXIT_FAILURE.
  5. After parsing the command line arguments, you must validate that the anagram filename ends in “.ano”. This is the exact same logic as you have in generateAnagramMain(). If the filename does not end in “.ano”, print out STR_ERR_ANAGRAM_FILE_EXT and return EXIT_FAILURE.

Build the queryTable: From parsing the command line arguments, we should now know the anagram filename and the size of the queryTable (should be set to default size if the user didnt enter one — see pa3.h).

With that in mind, we now need to first allocate the array of bucket_t structs using calloc() to zero-fill the memory. If that was successful, we want to go ahead and allocate the queryTable_t struct and initialize its members appropriately. If the allocation wasn’t successful, print STR_ERR_MEM to stderr and return indicating failure.

We then want to load anagrams into the queryTable by calling loadQueryTable(). After calling loadQueryTable(), your queryTable should be filled like the diagram above in the Detailed Overview section. If loadQueryTable() returns with an error, return indicating failure.

Enter Interactive Mode: Next, enter interactive mode by calling queryLoop().

Afterwards, you need to free ALL the memory (the memory allocated by calloc() and realloc()) you allocated for the queryTable and anything else. Watch out for dangling pointers. To free the hashtable structure, start by freeing each array of anagrams first, then free the array of buckets.

Reasons for error: When checking for errors, note that you should only print the short usage following an error message in the cases where the command line arguments themselves were wrong, and not in cases where the argument is correct, but the value entered is wrong. Unknown flag entered: Print short usage and return EXIT_FAILURE (getopt will print an error message for you) Too many command line arguments entered: Print error message, short usage, and return EXIT_FAILURE tableSize Too big to be converted: use snprintf() to build the error string using STR_ERR_CONVERTING, use perror() to print the error string, print a newline, then return EXIT_FAILURE Not an integer: print error message and return EXIT_FAILURE Not in bounds: print error message and return EXIT_FAILURE Anagram filename isn’t provided: print error message, short usage, and return EXIT_FAILURE Anagram filename doesn’t have the proper file extension: print error message and return EXIT_FAILURE Cannot allocate memory for the buckets array: print error message and return EXIT_FAILURE

Return Value: EXIT_SUCCESS on success, EXIT_FAILURE on failure.

Unit Testing

You are provided with only one basic unit test file for the Milestone function, anagramCmp(). This file only has minimal test cases and is only meant to give you an idea of how to write your own unit test files. You must write unit test files for each of the Milestone functions, as well as add several of your own thorough test cases to all of the unit test files. You need to add as many unit tests as necessary to cover all possible use cases of each function. You will lose points if you dont do this! You are responsible for making sure you thoroughly test your functions. Make sure you think about boundary cases, special cases, general cases, extreme limits, error cases, etc. as appropriate for each function. The Makefile includes the rules for compiling these tests. Keep in mind that your unit tests will not build until all required files for that specific unit test have been written. These test files will be collected with the Milestone, they must be complete before you submit your Milestone.

Unit tests you need to complete:
testanagramCmp.c
testassembleAnagram.c
testcountChars.c
testcountsCmp.c
testgetBucketIndex.c
testgetHashcode.c
testgetURemainder.c
testhashcodeCmp.c
To compile:
$ make testanagramCmp
To run:
$ ./testanagramCmp
(Replace testanagramCmp with the appropriate file
names to compile and run the other unit tests)

We also supply you an extra set of “optimized” Makefile targets for your tester files. These targets will compile your tester file with all optimizations turned on, similar to how your Milestone functions will be compiled for grading. Sometimes you will get (un)lucky where your program appears to work even when there is something

wrong (like incorrect memory layout or modifying a register you shouldn’t). Compiling with optimizations will expose some of these hidden problems. Again, this is how the milestone will be tested during grading, so use the optimization targets to try to expose these issues.

However, compiling this way prevents you from using gdb to debug, so make sure to compile the regular way when debugging and try the optimized targets after your tests pass.

To compile with optimizations on: $ make testanagramCmp-opt To run: $ ./testanagramCmp-opt

README File

Remember to follow all of the guidelines outlined in the README Guidelines. If you did the extra credit, write a program description for it in the README file as well.

Questions to Answer in your README: Unix

  1. You are trying to open your pa3Strings.h file. You have already typed “vim pa3S”. Which single key can you press to autocomplete the command to “vim pa3Strings.h”?
  2. What is the command to search for tabs in all the C files in your current directory? Your command should print out the line numbers of the tabs as well. C
  3. Suppose that we define the variable anagramPtr like so:
anagram_t * anagramPtr = calloc( 5, sizeof( anagram_t ) );
(a) What is the value of sizeof(anagramPtr) and
(b) Why?
  1. Give two different situations in which dynamic memory allocation (use of malloc/calloc/realloc) would be necessary.
  2. Rewrite the following expression such that the *, – >, and + operators are not used: ((((table.tablePtr + i)->dataPtr + j)).counts + k) Vim
  3. What is the command to turn a lowercase character into an uppercase character and vice versa?
  4. What is the command to repeat the last executed command? Academic Integrity
  5. Its the day that the assignment is due and your friend is really stressing out. They want to copy your code. What would you tell them to convince them to act with integrity?

Extra Credit

There are 4 points total for extra credit on this assignment. Early turnin: [4 Points] 48 hours before regular due date and time [2 Points] 24 hours before regular due date and time (its one or the other, not both)

Turnin Summary

See the turnin instructions here. Your file names must match the below exactly otherwise our Makefile will not find your files.

Milestone Turnin: Due: Wednesday night, Feb 20 @ 11:59 pm

Files required for the Milestone:
anagramCmp.s
assembleAnagram.s
countChars.c
countsCmp.s
getBucketIndex.s
getHashcode.s
getURemainder.s
hashcodeCmp.s

Final Turnin: Due: Wednesday night, Feb 27 @ 11:59 pm

Files required for the Final Turn-in:
anagramCmp.s
assembleAnagram.s
countsCmp.s
getBucketIndex.s
getHashcode.s
getURemainder.s
hashcodeCmp.s
isInBounds.s
countChars.c
executeQuery.c
generateAnagramFile.c
generateAnagramMain.c
insertIntoBucket.c
loadQueryTable.c
queryAnagramMain.c
queryLoop.c
testanagramCmp.c
testassembleAnagram.c
testcountChars.c
testcountsCmp.c
testgetBucketIndex.c
testgetHashcode.c
testgetURemainder.c
testhashcodeCmp.c
pa3.h
pa3Strings.h
pa3Globals.c
test.h
Makefile
README

If there is anything in these procedures which needs clarifying, please feel free to ask any tutor, the instructor, or post on the Piazza Discussion Board.