March 2021 Fps monitor crack.
Summary: An explanation of how to implement a simple hash table data structure using the C programming language. I briefly demonstrate linear and binary search, and then design and implement a hash table. My goal is to show that hash table internals are not scary, but – within certain constraints – are easy enough to build from scratch.
Go to:Linear search | Binary search | Hash tables | Implementation | Discussion
Recently I wrote an article that compared a simple program that counts word frequencies across various languages, and one of the things that came up was how C doesn’t have a hash table data structure in its standard library.
There are many things you can do when you realize this: use linear search, use binary search, grab someone else’s hash table implementation, or write your own hash table. Or switch to a richer language. We’re going to take a quick look at linear and binary search, and then learn how to write our own hash table. This is often necessary in C, but it can also be useful if you need a custom hash table when using another language.
The General Hash Function Algorithm library contains implementations for a series of commonly used additive and rotative string hashing algorithm in the Object Pascal, C and C programming languages General Purpose Hash Function Algorithms - By Arash Partow:. . Implements a dictionary's functionality. Functions/definitions in this program are used in speller.c to help create a. spellchecker. Functions include load, unload, check, and size. Load runs. through a text file full of words and loads them into memory as a hash table. (dictionary). Unload frees the memory the hash table is using. Hashing (Hash Function) In a hash table, a new index is processed using the keys. And, the element corresponding to that key is stored in the index. This process is called hashing. Let k be a key and h(x) be a hash function. Here, h(k) will give us a new index to store the element linked with k. Hash table Representation. To learn more, visit. Jul 10, 2016 Notice how using a dictionary allows us t o retrieve the data using the dictionary’s key-value mapping. In this post we will build a hash table that uses this method of data storage.This data. C Program To Implement Dictionary Using Hashing It Out Medium password security using SHA algorithms. For example, Python adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed. But if the values are persisted (for example, written to.
Linear search
The simplest option is to use linear search to scan through an array. This is actually not a bad strategy if you’ve only got a few items – in my simple comparison using strings, it’s faster than a hash table lookup up to about 7 items (but unless your program is very performance-sensitive, it’s probably fine up to 20 or 30 items). Linear search also allows you to append new items to the end of the array. With this type of search you’re comparing an average of num_keys/2 items.
Let’s say you’re searching for the key bob
in the following array (each item is a string key with an associated integer value):
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
Key | foo | bar | bazz | buzz | bob | jane | x |
Value | 10 | 42 | 36 | 7 | 11 | 100 | 200 |
You simply start at the beginning (foo
at index 0) and compare each key. If the key matches what you’re looking for, you’re done. If not, you move to the next slot. Searching for bob
takes five steps (indexes 0 through 4).
Here is the algorithm in C (assuming each array item is a string key and integer value):
Binary search
Another simple approach is to put the items in an array which is sorted by key, and use binary search to reduce the number of comparisons. This is kind of how we might look something up in a (paper) dictionary.
C even has a bsearch
function in its standard library. Binary search is reasonably fast even for hundreds of items (though not as fast as a hash table), because you’re only comparing an average of log(num_keys) items. However, because the array needs to stay sorted, you can’t insert items without copying the rest down, so insertions still require an average of num_keys/2 operations.
Assume we’re looking up bob
again (in this pre-sorted array):
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
Key | bar | bazz | bob | buzz | foo | jane | x |
Value | 42 | 36 | 11 | 7 | 10 | 100 | 200 |
With binary search, we start in the middle (buzz
), and if the key there is greater than what we’re looking for, we repeat the process with the lower half. If it’s greater, we repeat the process with the higher half. In this case it results in three steps, at indexes 3, 1, 2, and then we have it. This is 3 steps instead of 5, and the improvement over linear search gets (exponentially) better the more items you have.
Here’s how you’d do it in C (with and without bsearch
). The definition of the item
struct is the same as above.
Note: in binary_search
, it would be slightly better to avoid the up-front “half size overflow check” and allow the entire range of size_t
. This would mean changing the mid
calculation to low + (high-low)/2
. However, I’m going to leave the code stand for educational purposes – with the initial overflow check, I don’t think there’s a bug, but it is non-ideal that I’m only allowing half the range of size_t
. Not that I’ll be searching a 16 exabyte array on my 64-bit system anytime soon! For further reading, see the article Nearly All Binary Searches and Mergesorts are Broken. Thanks Seth Arnold and Olaf Seibert for the feedback.
Hash tables
Hash tables can seem quite scary: there are a lot of different types, and a ton of different optimizations you can do. However, if you use a simple hash function together with what’s called “linear probing” you can create a decent hash table quite easily.
If you don’t know how a hash table works, here’s a quick refresher. A hash table is a container data structure that allows you to quickly look up a key (often a string) to find its corresponding value (any data type). Under the hood, they’re arrays that are indexed by a hash function of the key.
A hash function turns a key into a random-looking number, and it must always return the same number given the same key. For example, with the hash function we’re going to use (64-bit FNV-1a), the hashes of the keys above are as follows:
Key | Hash | Hash modulo 16 |
---|---|---|
bar | 16101355973854746 | 10 |
bazz | 11123581685902069096 | 8 |
bob | 21748447695211092 | 4 |
buzz | 18414333339470238796 | 12 |
foo | 15902901984413996407 | 7 |
jane | 10985288698319103569 | 1 |
x | 12638214688346347271 | 7 (same as foo ) |
The reason I’ve shown the hash modulo 16 is because we’re going to start with an array of 16 elements, so we need to limit the hash to the number of elements in the array – the modulo operation divides by 16 and gives the remainder, limiting the array index to the range 0 through 15.
When we insert a value into the hash table, we calculate its hash, modulo by 16, and use that as the array index. So with an array of size 16, we’d insert bar
at index 10, bazz
at 8, bob
at 4, and so on. Let’s insert all the items into our hash table array (except for x
– we’ll get to that below):
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Key | . | jane | . | . | bob | . | . | foo | bazz | . | bar | . | buzz | . | . | . |
Value | . | 100 | . | . | 11 | . | . | 10 | 36 | . | 42 | . | 7 | . | . | . |
To look up a value, we simply fetch array[hash(key) % 16]
. If the array size is a power of two, we can use array[hash(key) & 15]
. Note how the order of the elements is no longer meaningful.
But what if two keys hash to the same value (after the modulo 16)? Depending on the hash function and the size of the array, this is fairly common. For example, when we try to add x
to the array above, its hash modulo 16 is 7. But we already have foo
at index 7, so we get a collision.
Lotin kiril programma. There are various ways of handling collisions. Traditionally you’d create a hash array of a certain size, and if there was a collision, you’d use a linked list to store the values that hashed to the same index. However, linked lists normally require an extra memory allocation when you add an item, and traversing them means following pointers scattered around in memory, which is relatively slow on modern CPUs.
A simpler and faster way of dealing with collisions is linear probing: if we’re trying to insert an item but there’s one already there, simply move to the next slot. If the next slot is full too, move along again, until you find an empty one, wrapping around to the beginning if you hit the end of the array. (There are other ways of probing than just moving to the next slot, but that’s beyond the scope of this article.) This technique is a lot faster than linked lists, because your CPU’s cache has probably fetched the next items already.
Here’s what the hash table array looks like after adding “collision” x
(with value 200). We try index 7 first, but that’s holding foo
, so we move to index 8, but that’s holding bazz
, so we move again to index 9, and that’s empty, so we insert it there:
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Key | . | jane | . | . | bob | . | . | foo | bazz | x | bar | . | buzz | . | . | . |
Value | . | 100 | . | . | 11 | . | . | 10 | 36 | 200 | 42 | . | 7 | . | . | . |
When the hash table gets too full, we need to allocate a larger array and move the items over. This is absolutely required when the number of items in the hash table has reached the size of the array, but usually you want to do it when the table is half or three-quarters full. If you don’t resize it early enough, collisions will become more and more common, and lookups and inserts will get slower and slower. If you wait till it’s almost full, you’re essentially back to linear search.
With a good hash function, this kind of hash table requires an average of one operation per lookup, plus the time to hash the key (but often the keys are relatively short string).
And that’s it! There’s a huge amount more you can do here, and this just scratches the surface. I’m not going to go into a scientific analysis of big O notation, optimal array sizes, different kinds of probing, and so on. Read Donald Knuth’s TAOCP if you want that level of detail!
Hash table implementation
You can find the code for this implementation in the benhoyt/ht repo on GitHub, in ht.h and ht.c. For what it’s worth, all the code is released under a permissive MIT license.
I got some good feedback from Code Review Stack Exchange that helped clean up a few sharp edges, not the least of which was a memory leak due to how I was calling strdup
during the ht_expand
step (fixed here). I confirmed the leak using Valgrind, which I should have run earlier. Seth Arnold also gave me some helpful feedback on a draft of this article. Thanks, folks!
API design
First let’s consider what API we want: we need a way to create and destroy a hash table, get the value for a given key, set a value for a given key, get the number of items, and iterate over the items. I’m not aiming for a maximum-efficiency API, but one that is fairly simple to implement.
Dreamcast iso to cdi. After a couple of iterations, I settled on the following functions and structs (see ht.h):
A few notes about this API design:
- For simplicity, we use C-style NUL-terminated strings. I know there are more efficient approaches to string handling, but this fits with C’s standard library.
- The
ht_set
function allocates and copies the key (if inserting for the first time). Usually you don’t want the caller to have to worry about this, or ensuring the key memory stays around. Note thatht_set
returns a pointer to the duplicated key. This is mainly used as an “out of memory” error signal – it returns NULL on failure. - However,
ht_set
does not copy the value. It’s up to the caller to ensure that the value pointer is valid for the lifetime of the hash table. - Values can’t be NULL. This makes the signature of
ht_get
slightly simpler, as you don’t have to distinguish between a NULL value and one that hasn’t been set at all. - The
ht_length
function isn’t strictly necessary, as you can find the length by iterating the table. However, that’s a bit of a pain (and slow), so it’s useful to haveht_length
. - There are various ways I could have done iteration. Using an explicit iterator type with a while loop seems simple and natural in C (see the example below). The value returned from
ht_iterator
is a value, not a pointer, both for efficiency and so the caller doesn’t have to free anything. - There’s no
ht_remove
to remove an item from the hash table. Removal is the one thing that’s trickier with linear probing (due to the “holes” that are left), but I don’t often need to remove items when using hash tables, so I’ve left thatoutas an exercise for the reader.
Demo program
Below is a simple program (demo.c) that demonstrates using all the functions of the API. It counts the frequencies of unique, space-separated words from standard input, and prints the results (in an arbitrary order, because the iteration order of our hash table is undefined). It ends by printing the total number of unique words.
Now let’s turn to the hash table implementation (ht.c).
Create and destroy
Allocating a new hash table is fairly straight-forward. We start with an initial array capacity of 16 (stored in capacity
), meaning it can hold up to 8 items before expanding. There are two allocations, one for the hash table struct itself, and one for the entries array. Note that we use calloc
for the entries array, to ensure all the keys are NULL to start with, meaning all slots are empty.
The ht_destroy
function frees this memory, but also frees memory from the duplicated keys that were allocated along the way (more on that below).
Hash function
Next we define our hash function, which is a straight-forward C implementation of the FNV-1a hash algorithm. Note that FNV is not a randomized or cryptographic hash function, so it’s possible for an attacker to create keys with a lot of collisions and cause lookups to slow way down – Python switched away from FNV for this reason. For our use case, however, FNV is simple and fast.
As far as the algorithm goes, FNV-1a simply starts the hash with an “offset” constant, and for each byte in the string, XORs the hash with the byte, and then multiplies it by a big prime number. The offset and prime are carefully chosen by people with PhDs.
We’re using the 64-bit variant, because, well, most computers are 64-bit these days and it seemed like a good idea. You can tell I don’t have one of those PhDs. :-) Seriously, though, it seemed better than using the 32-bit version in case we have a very large hash table.
I won’t be doing a detailed analysis here, but I have included a little statistics program that prints the average probe length of the hash table created from the unique words in the input. The FNV-1a hash algorithm we’re using seems to work well on the list of half a million English words (average probe length 1.40), and also works well with a list of half a million very similar keys like word1
, word2
, and so on (average probe length 1.38).
Interestingly, when I tried the FNV-1 algorithm (like FNV-1a but with the multiply done before the XOR), the English words still gave an average probe length of 1.43, but the similar keys performed very badly – an average probe length of 5.02. So FNV-1a was a clear winner in my quick tests.
Get
Next let’s look at the ht_get
function. First it calculates the hash, modulo the capacity
(the size of the entries array), which is done by ANDing with capacity - 1
. Using AND is only possible because, as we’ll see below, we’re ensuring our array size is always a power of two, for simplicity.
Then we loop till we find an empty slot, in which case we didn’t find the key. For each non-empty slot, we use strcmp
to check whether the key at this slot is the one we’re looking for (it’ll be the first one unless there had been a collision). If not, we move along one slot.
Set
The ht_set
function is slightly more complicated, because it has to expand the table if there are too many elements. In our implementation, we double the capacity whenever it gets to be half full. This is a little wasteful of memory, but it keeps things very simple.
First, the ht_set
function. It simply expands the table if necessary, and then inserts the item:
The guts of the operation is in the ht_set_entry
helper function (note how the loop is very similar to the one in ht_get
). If the plength
argument is non-NULL, it’s being called from ht_set
, so we allocate and copy the key and update the length:
What about the ht_expand
helper function? It allocates a new entries array of double the current capacity, and uses ht_set_entry
with plength
NULL to copy the entries over. Even though the hash value is the same, the indexes will be different because the capacity has changed (and the index is hash modulo capacity).
Length and iteration
The ht_length
function is trivial – we update the number of items in _length
as we go, so just return that:
Iteration is the final piece. To create an iterator, a user will call ht_iterator
, and to move to the next item, call ht_next
in a loop while it returns true
. Here’s how they’re defined:
Discussion
That’s it – the implementation in ht.c is only about 200 lines of code, including blank lines and comments.
Beware: this is a teaching tool and not a library, so I encourage you to play with it and let me know about any bugs I haven’t found! I would advise against using it without a bunch of further testing, checking edge cases, etc. Remember, this is unsafe C we’re dealing with. Even while writing this I realized I’d used malloc
instead of calloc
to allocate the entries array, which meant the keys may not have been initialized to NULL.
As I mentioned, I wanted to keep the implementation simple, and wasn’t too worried about performance. However, a quick, non-scientific performance comparison with Go’s map
implementation shows that it compares pretty well – with half a million English words, this C version is about 50% slower for lookups and 40% faster for insertion.
Speaking of Go, it’s even easier to write custom hash tables in a language like Go, because you don’t have to worry about handling memory allocation errors or freeing allocated memory. I recently wrote a counter package in Go which implements a similar kind of hash table.
There’s obviously a lot more you could do with the C version. You could focus on safety and reliability by doing various kinds of testing. You could focus on performance, and reduce memory allocations, use a “bump allocator” for the duplicated keys, store short keys inside each item struct, and so on. You could improve the memory usage, and tune _ht_expand
to not double in size every time. Or you could add features such as item removal.
After I’d finished writing this, I remembered that Bob Nystrom’s excellent Crafting Interpreters book has a chapter on hash tables. He makes some similar design choices, though his chapter is significantly more in-depth than this article. If I’d remembered his chapter before I started, I probably wouldn’t have written this one!
In any case, I hope you’ve found this useful or interesting. If you spot any bugs or have any feedback, please let me know. You can also go to the discussions on Hacker News, programming Reddit, and Lobsters.
Table of Contents
Summary
In this homework, you will implement a hash dictionary (also known as a hash map) and a hash set. We will be using these two data structures extensively in the next project.
This entire project is due on Wednesday, July 31 at 11:59pm.
You will use these files from your prior assignments
src/main/java/datastructures/dictionaries/ArrayDictionary.java
src/main/java/datastructures/lists/DoubleLinkedList.java
If you have chosen a new partner for this assignment, choose either of your submissions from HW2 and verify that these are functioning properly.
You will be modifying the following files:
src/main/java/datastructures/dictionaries/ArrayDictionary.java
src/main/java/datastructures/dictionaries/ChainedHashDictionary.java
src/main/java/datastructures/sets/ChainedHashSet.java
Additionally, here are a few more files that you might want to review while completing the assignment (note that this is just a starting point, not necessarily an exhaustive list):
src/test/java/datastructures/dictionaries/BaseTestDictionary.java
src/test/java/datastructures/dictionaries/TestChainedHashDictionary
src/test/java/datastructures/sets/TestChainedHashSet.java
src/main/java/datastructures/dictionaries/IDictionary.java
src/main/java/datastructures/sets/ISet.java
src/main/java/analysis/experiments/*
Here's another video overview. Note: this video is from 19wi, so some info in this video may be a little outdated.
Expectations
Here are some baseline expectations we expect you to meet in all projects:
Follow the course collaboration policies
DO NOT use any classes from
java.util.*
. There are only two exceptions to this rule:You may import and use the following classes:
java.util.Iterator
java.util.NoSuchElementException
java.util.Objects
java.util.Arrays
You may import and use anything from
java.util.*
within your testing code.
DO NOT make modifications to instructor-provided code (unless told otherwise). If you need to temporarily change our code for debugging, make sure to change it back afterwards.
Section a: Set up project
Clone the starter code from GitLab and open the project in your IDE. See the instructions from Project 0 if you need a reminder on how to do this.
Copy your
DoubleLinkedList.java
andArrayDictionary.java
files from Project 1 to this new one.Copy your
DoubleLinkedList
delete
tests from Project 1 and paste them directly intoTestDoubleLinkedList.java
.Next make sure everything works.
Try running
SanityCheck.java
, and try running Checkstyle. Checkstyle should still report the same 5 errors withSanityCheck.java
as it did with Project 0.Try running
TestDoubleLinkedList
andTestArrayDictionary
, and make sure the tests still pass.
Section b: Implement new ArrayDictionary
constructor
In order to run one of the upcoming experiments, you will add an extra constructor to the existing ArrayDictionary
class. This constructor will be used when you implement ChainedHashDictionary
in the next part. The constructor should take in an integer representing the initial capacity of the pairs
array.
Below is the constructor stub you should implement:
Tip: to make sure this constructor is working, we can refactor our code to always use this new constructor. Try replacing your existing 0-argument constructor with the following code that will call your new constructor:
Afterwards, make sure the tests in TestArrayDictionary
still pass.
C Program To Implement Dictionary Using Hashing Functions Using
Section c: Implement ChainedHashDictionary
Task: Complete the ChainedHashDictionary
class.
In this task, you will implement a hash table that uses separate chaining as its collision resolution strategy.
Correctly implementing your iterator will be tricky—don't leave it to the last minute! Try to finish the other methods in ChainedHashDictionary
as soon as possible so you can move on to implementing iterator()
.
In the class when we covered separate chaining hash tables we used LinkedList
as the chaining data structure. In this task, instead of LinkedList
, you will use your ArrayDictionary
(from Project 1) as the chaining data structure.
When you first create your chains
array, it will contain null pointers. As key-value pairs are inserted in the table, you need to create the chains (ArrayDictionary
s) as required. Let's say you created an array of size 5 (you can create array of any size), and you inserted the key-value pair ('a', 11).
Your hash table should something like the following figure. In this example, the key 'a' lands in index 2, but if might be in a different index depending on your table size. Also, in this example, ArrayDictionary
(chain) is of size 3, but you can choose a different size for your ArrayDictionary
.
Now, suppose you inserted a few more keys:
Your internal hash table should now look like the figure below. In this example, keys 'a' and 'f' both hash to the same index (2).
Notes:
- The constructor you implement will take in a few parameters:
resizingLoadFactorThreshold
: if the ratio of items to buckets exceeds this, you should resizeinitialChainCount
: how many chains/buckets there are initiallychainInitialCapacity
: the initial capacity of eachArrayDictionary
inner chain - For the other, 0-argument constructor, you'll need to define some reasonable defaults in the final fields at the top of the class.
- Use
ArrayDictionary
for your internal chains/buckets.- Whenever you make a new
ArrayDictionary
, be sure to use your newArrayDictionary
constructor to correct set its initial capacity.
- Whenever you make a new
- If your
ChainedHashDictionary
receives a null key, use a hashcode of 0 for that key. - You may implement any resizing strategy covered in lecture—we recommend doubling the number of chains on every resize since it's the simplest to implement, though.
- We will be asking about your implementation design decisions later on, so it may be helpful to read ahead so you can keep this in mind while you implement
ChainedHashDictionary
. - Correctly implementing your iterator will be tricky—don't leave it to the last minute! Try to finish the other methods in
ChainedHashDictionary
as soon as possible so you can move on to implementingiterator()
. - Do not try to implement your own hash function. Use the hash method
hashCode()
that Java provides for all classes: so to get the hash of a key, usekeyHash = key.hashCode()
. This method returns an integer, which can be negative or greater thanchains.length
. How would you handle this? - Recall that operations on a hash table slow down as the load factor increases, so you need to resize (expand) your internal array. When resizing your
ArrayDictionary
, you just copied over item from the old array to the new one. Here, how would you move items from one hash table to another?
Notes on the ChainedHashDictionaryIterator
Restrictions, assumptions, etc.:
- You may not create any new data structures. Iterators are meant to be lightweight and so should not be copying the data contained in your dictionary to some other data structure.
- You may (and probably should) call the
.iterator()
method on eachIDictionary
inside yourchains
array, however, as instantiating an iterator from an existing data structure is both low cost in space and time. - You may and should add extra fields to keep track of your iteration state. You can add as many fields as you want. If it helps, our reference implementation uses three (including the one we gave you).
- Your iterator doesn't need to yield the pairs in any particular order.
- You should assume that a client will not modify your underlying data structure (the
ChainedHashDictionary
) while you iterate over it. For example, the following will never happen: Note that there are some tests that do something that looks similar but is different: they modify the dictionary in between creating new iterator objects, which is allowed behavior—it's okay to modify your data structure and then loop over it again, as long as you do not modify it while looping over it.
Tips for planning your implementation:
Before you write any code, try designing an algorithm using pencil and paper and run through a few examples by hand. This means you should draw the
chains
array that has some varying number ofArrayDictionary
objects scattered throughout, and you should try simulate what your algorithm does.Try to come up with some invariants for your code. These invariants must always be true after the constructor finishes, and must always be true both before and after you call any method in your class.
Having good invariants will greatly simplify the code you need to write, since they reduce the number of cases you need to consider while writing code. For example, if you decide that some field should never be null and write your code to ensure that it always gets updated to be non-null before the method terminates, you'll never need to do null checks for that field at the start of your methods.
As another example, it's possible to pose the
DoubleLinkedList
iterator's implementation in terms of invariants:- As long as the iterator has more values, the
next
field is always non-null and contains the next node to output. - When the iterator has no more values, the
next
field is null.
Additional notes:
- Once you've decided on some invariants, write them down in a comment somewhere so that you don't forget about them. We'll ask about these again in the writeup as well.
- You may also find it useful to write a helper method that checks your invariants and throws an exception if they're violated. You can then call this helper method at the start and end of each method if you're running into issues while debugging. (Be sure to disable this method once your iterator is fully working.)
- It may be helpful to revisit your main
ChainedHashDictionary
code and add additional invariants there to reduce the number of cases for thechains
array.
- As long as the iterator has more values, the
- We strongly recommend you spend some time designing your iterator before coding. Getting the invariants correct can be tricky, and running through your proposed algorithm using pencil and paper is a good way of helping you iron them out.
Section d: Implement ChainedHashSet
Task: Complete the ChainedHashSet
class.
In section c, you implemented the dictionary ADT with a hash table. You can also implement the set ADT using hash tables. Recall that sets store only a key, not a key-value pair. In sets, the primary operation of interest is contains(key)
, which returns whether a key is part of the set. Hash tables provide an efficient implementation for this operation.
Notes:
- To avoid code duplication, we will use an internal dictionary of type
ChainedHashDictionary<KeyType, Boolean>
to store items (keys) in yourChainedHashSet
. Since there are no key-values pairs (only keys) in sets, we will completely ignore the values in your dictionary: use a placeholder boolean whenever necessary. - Your code for this class should be very simple: your inner dictionary should be doing most of the work.
Section e (highly recommended): Consider more test cases
In this homework assignment, we won't be grading the tests that you write. But, to be thorough and to foster good habits, we strongly encourage you to write additional tests for your code, since they may help you spot different bugs or make you more confident in the correctness of your implementation. (Also remember that we have 'secret' tests that you will be graded on in addition to the provided tests, so it's in your best interest to test more cases.)
For this assignment, you should focus in particular on edge cases. Whenever you see conditional logic in your code (if statements, loop conditions, etc.) you should consider writing a test to check its edge cases. To make sure your tests are truly comprehensive, you should make sure that your test cases end up running every possible conditional branch in your code.
Although you can see the inner workings of your own code, it may sometimes be difficult to write code to check the actual state of your data structures by calling their public methods. Instead, you can test the state of your data structures by accessing your private fields directly; see the blue box below for more details.
Just like in Project 1, we've specified your fields to be package-private, which means the tests located in the same package can actually access the internal fields of your ChainedHashDictionary
and ChainedHashSet
.
The constructor tests in TestChainedHashDictionary.java
use a helper method to access the array of chains in your ChainedHashDictionary
; feel free to use the same helper method in your own tests.
Section f: Complete group write-up
Task: Complete a write-up containing answers to the following questions.
You and your partner will work together on this write-up: your completed write-up MUST be in PDF format. You will submit it to Gradescope. Log into Gradescope with your @uw.edu email address. When you submit, mark which pages correspond to the questions we ask, and afterwards, the partner who uploaded the PDF must add the other partner as a group member on Gradescope. Do not have each member submit individually. A video showing how to do this can be found here. We may deduct points if you do not correctly submit as a group or do not properly assign your pages/problems.
Design decisions
Before we get to the experiments, here are some questions about design decisions you made while doing the programming portion of the assignment.
ChainedHashDictionary
For this first prompt, reflect on a design decision you deliberately made while implementing your ChainedHashDictionary
. The specifications for this assignment are deliberately loose, so you should have needed to make some decisions on your own. Consider 1 such decision you had to make, and answer the following questions:
- What was the situation—what functionality were you implementing, and what details about it were left unspecified in the instructions?
- Describe at least two viable implementations that you considered.
- Describe the pros and cons of each solution you listed.
- What was your final choice? Why did you choose it over the other(s)—why did its pros and cons outweigh the pros and cons of the other(s)?
Your responses will be graded partially based on effort, and partially based on their clarity and how well they described your design decision. If you choose a design decision that had an obviously-best solution, or a problem in which you arbitrarily decided on a final solution, you may lose points.
ChainedHashDictionaryIterator
For this prompt, briefly describe the invariant(s) you chose for your ChainedHashDictionary
iterator. Did you find them useful while implementing the iterator? Also note down any invariants you discarded for any reason (e.g., they were too inefficient/impossible to enforce, or they simply weren't useful).
This section will be graded based on completion.
Experiments
For each of the experiments, answer the bolded questions (and questions in the orange boxes) in your write-up. Just like before, a plot will automatically be generated to display the results of the experiments; include PNGs of the plots inside your write-up PDF.
The hypothesis/predict-based-on-the-code portions of your write-up will be graded based on completion, so just write down your honest thoughts before running the experiment. The post-experiment analysis portions will be graded based on the clarity of your explanations and whether they match up with your plot.
Experiment 1: Chaining with different hashCode
s vs. AVL trees
C Program To Implement Dictionary Using Hashing Functions Pdf
This experiment explores how different hash code functions affect the runtime of a ChainedHashDictionary
, and compare that to the runtime of an AVLTreeDictionary
.
First, we’ll look at the tests involving the ChainedHashDictionary
: test1
, test2
, and test3
. Each uses a different class (FakeString1
, FakeString2
, and FakeString3
respectively) as keys for a ChainedHashDictionary
. Each of the different fake string objects represent a string (by storing an array of chars) but each class has different implementations of the hashCode
method. Read over these implementations of hashCode
and take a look at the corresponding histograms (each plot shows the distributions of outputs of the hashCode
methods across 80,000 randomly-generated fake strings).
Below is a histogram for FakeString1
, the type of key used in test1
Below is the histogram for FakeString2
(test2
)
Below is the histogram for FakeString3
(test3
)
Now, predict which test method will have the fastest and slowest asymptotic runtime growths.
- Note that the distributions for the hash codes used in
test1
andtest2
look very similar in shape, but in your graphs you produce, you should see thattest2
(FakeString2
) tends to run much faster thantest1
(FakeString1
)—why is that? Hint: look at the x-axis scale labels. - You should also see that
FakeString3
produces the fastest runtimes when used as keys forChainedHashDictionary
—why is this? Explain using the histogram above, citing at least 1 observation about the histogram.
Now, we’ll consider test4
, which uses the AVLTreeDictionary
. This test uses a fake string class that does not provide a working hashCode
method, but is comparable in a way that mimics regular String
comparisons. You should see that the AVL-tree-based implementation performs much better than the chained-hash implementation when used with bad key objects, but looks like it’s only about a constant factor worse than chained-hashing with good keys.
- What functionality does
ChainedHashDictionary
require from its keys? What else must be true of its keys in order for the dictionary to perform well, if anything? - What functionality does
AVLTreeDictionary
require from its keys? What else must be true of its keys in order for the dictionary to perform well, if anything? - Which of these two has a runtime with a better (slower) asymptotic growth: the
AVLTreeDictionary
, or theChainedHashDictionary
(with good keys)? (Your answer should be based on the properties of these implementations, not just the results of this graph.)
Experiment 2: Load factor thresholds for resizing
This experiment tests the runtime for ChainedHashDictionary
's put
method across different values of the load factor threshold for resizing.
First, answer the following prompts:
- Briefly describe the difference between
test1
andtest2
. - Which test do you expect to run faster? What about asymptotically faster?
- Why is using the load factor of 300 slow? Explain at a high level how this affects the
ChainedHashDictionary.put
behavior. - This was not a part of this experiment, but explain why very small load factor thresholds (much less than 0.75; e.g., 0.05) might be wasteful.
Experiment 3: Initial internal chain capacities
This experiment tests the runtime for inserting elements into ChainedHashDictionary
with different initial capacities for the internal ArrayDictionary
chains.
Briefly describe the differences between the three tests.
Note that although the runtimes when using the three initial sizes are similar, using an initial capacity of 2 results in the fewest spikes and is generally the fastest. Why would a lower initial ArrayDictionary
capacity result in a more consistent and faster runtime?
Experiment 4: Data structure memory usage, take 2
This last experiment will estimate the amount of memory used by DoubleLinkedList
, ArrayDictionary
, and AVLTreeDictionary
as they grow in size. Predict the complexity class (constant, logarithmic, linear, (nlog(n)), quadratic, exponential) of memory usage for each of the 3 data structures the as its size increases.
C Program To Implement Dictionary Using Hashing Functions Based
Note: You may get the following warnings when the experiment; just ignore them:
WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
- Describe the overall shapes of the graphs. Explain why two of them are similar but one is different.
- You should see that
test1
uses less memory thantest3
. Is the actual difference on your plot a difference in complexity classes or constant factors? What are some possible reasons that make the memory usages ofDoubleLinkedList
less thanAVLTreeDictionary
?
Section g: Complete individual feedback survey
Task: Submit a response to the feedback survey.
After finishing the project, take a couple minutes to complete this individual feedback survey on Canvas. (Each partner needs to submit their own individual response.)
Deliverables
The following deliverables are due on Wednesday, July 31 at 11:59pm.
C Program To Implement Dictionary Using Hashing Functions Using
Before submitting, be sure to double-check that:
C Program To Implement Dictionary Using Hashing Functions Example
Submit by pushing your code to GitLab and submitting your writeup to Gradescope. If you intend to submit late, fill out this late submission form when you submit.