A* is a popular traversal algorithm used for finding paths between vertices (nodes) in a graph. It's also a topic which is often asked about in more advanced coding interviews, and in interviews for certain relevant areas of software engineering (gaming, social networks, search, etc...). This project gives a sample implementation of A*, and looks at how different heuristic functions can affect the speed of finding a path through the graph.
I'll use the following terminology throughout the rest of the article...
Term | Meaning |
---|---|
Vertex | A point within a graph which represents some data entity. It's also often alternately referred to as a node. As the sample graph in the project is a graph of adjacent words (see below), I often use 'word' to mean a vertex in the code and documentation (source word, destination word, adjacent word, etc...). |
Edge | A direct path between two vertices. |
Source vertex | (or 'source word') - The 'start' vertex in a path through the graph. |
Destination vertex | (or 'destination word') - The 'end' vertex in a path through the graph. |
Current vertex | (or 'current word') - A vertex which is currently being traversed while finding a path. |
Candidate vertex | (or 'candidate word') - A vertex adjacent to the current vertex which is assessed for 1) whether it should be traversed to, and 2) the priority for traversing to it (as compared to other candidate vertices). |
Breadth-first search is (along with depth-first search) one of the fundamental graph traversal algorithms. It works by 'radiating' outward from the source vertex... i.e. first all the vertices directly adjacent to the source vertex are traversed, then all the vertices two edges away from the source, and so on until the destination vertex is found.
This uses the same approach as for standard breadth-first search, but runs two traversal processes in parallel... one radiating out from the source vertex and one radiating from the destination. All the vertices encountered in each process are recorded, and when a vertex is found which has already been encountered by the other process, a path between the source and destination is found. Typically bidirectional breadth-first search is significantly faster at finding a path than standard breadth-first search, as the total number of vertices traversed to find a path is far less. This is because the 'radiating' process of traversal grows exponentially as the process gets further from the initial vertex, so keeping that distance from the initial vertex short (by running two parallel processes) results in far less vertices needing to be traversed. The Wikipedia article for bidirectional search explains this in terms of big O notation.
Dijkstra's algorithm is able to find the shortest path between a source and destination vertex, and is best suited to weighted graphs, where different edges have a different cost to traverse (e.g. as would be the case with a graph of roads between cities, where each road has a different distance). It's base implementation is typically similar to breadth-first search, in that it uses a queue to store the candidate vertices to traverse to next. In breadth-first search this queue is a simple FIFO queue, and hence the vertices are traversed in order depending on their distance (in terms of number of edges) from the source. In later variations of Dijkstra's algorithm, instead a priority queue is used, and the weighting of edges is taken into account... i.e. vertices whose total distance from the source (taking edge weight into account) is less are traversed to before vertices whose total distance is greater. It also maintains a data structure recording the shortest distance found so far from the source to each encountered vertex. Finally, in order to guarantee it has found the shortest path, it does not stop traversing when the destination vertex is first encountered... but continues until all vertices adjacent to the destination have been traversed to.
A* is somewhat like an enhancement to Dijkstra's algorithm, which uses a heuristic function in addition to the distance from the source vertex, to better prioritize where to traverse to next, and to find a path between the source and destination vertices more quickly. If you look at the animation on the A* Wikipedia page, you'll see that the algorithm seems to 'know' where the destination vertex is, and seems to 'seek' towards it (unlike the similar animation for Dijkstra's algorithm which more 'radiates' evenly from the source vertex)... and this is exactly what the heuristic function allows A* to do.
The heuristic function generally works by providing additional information on the relationship between a candidate vertex and the destination vertex. In graphs representing maps of cities and roads, a heuristic function could give the straight line distance ('as the crow flies') between each candidate city and the destination... then candidates further from the destination would be given lower priority for traversal. This provides an advantage over Dijkstra's algorithm, as candidate cities whose path (by road) from the source was short, but were far (in a straight line) from the destination would be given lower priority for traversal. Another example in the city map context could be to have a heuristic function which calculates the compass direction between each candidate city and the destination... then candidate cities which are in similar compass directions as the destination could be prioritized over cities which are in a different direction.
In the graph in A* Experiment, each vertex holds a four letter word. Edges connect words which are adjacent, that is, words which differ by one letter (for example 'barn' and 'burn'). A small example portion of the graph appears below...
A path between words 'purr' and 'bark' in the above sample would be...
purr > burr > burn > born > barn > bark
The shortest path between the same words is...
purr > burr > burn > barn > bark
Using the recommended source word dictionary, the entire graph contains just over 3000 vertices, and almost 13000 edges. However, one key difference between usual methods of representing graphs, and the one used in A* Experiment, is the the entire graph is not pre-built (e.g. as they would usually be with an adjacency list or similar). All the words in the graph are stored in a character trie, and edges calculated 'on the fly' (i.e. during graph traversal) via an efficient search through the trie (this is performed by method CharacterTrieUtilities.FindAdjacentWords()).
Since an edge always joins two words which differ by one character, the weights of all edges are the same (i.e. the graph is not weighted). This means the benefits of Dijkstra's algorithm over breadth-first search are not as apparent as they would be in a weighted graph.
When prioritizing candidate vertices during traversal, A* considers two components in determining the priority... g(n) and h(n). g(n) is the distance from the source vertex (with lower distance giving higher priority), which is the same prioritization method as the later variations of Dijkstra's algorithm outlined above. h(n) is the heuristic function which gives some additional information about the relationship between the candidate vertex and the destination. In A* Experiment there are 3 separate heuristic functions in addition to g(n), and all can be assigned different weightings to give an overall candidate vertex priority. Weightings of the g(n) and h(n) functions are controlled by constructor parameters to the CandidateWordPriorityCalculator class. The g(n) weighting is set by parameter 'sourceWordToCandidateWordDistanceWeight'. The available heuristic functions are...
This function looks at the number of matching characters (i.e. same characters in the same position) between the candidate and destination words. For example for candidate words 'burn' and 'turf', and destination 'horn', 'burn' would be prioritzed over 'turf' as it has 2 characters in common with 'horn', as opposed to 1 for 'turf'. This function's weight is set by constructor parameter 'numberOfCharactersMatchingDestinationWeight'.
This function compares the character that is changed between the current word and the candidate word (the character that is 'changed to'), and prioritizes characters that occur more frequently as 'changed from' characters throughout the graph. For example, for current word 'turn', and candidate words 'burn' and 'turf', and assuming the following frequencies of 'changed from' characters in adjacent words...
...'burn' would be prioritized over 'turf', as 'b' is more frequently substituted ('changed from') in adjacent words than 'f'. The thinking behind this is that moving to words with characters that are more frequently the 'changed from' character in adjacent words, should provide more possible candidate words in future traversal steps, and help to get to the destination more quickly. Strictly speaking this may not constitute a true A* heuristic function, as whilst it can help to move through the graph more quickly, it doesn't say anything specifically about the relationship between the candidate and destination. The frequency of change in adjacent words for each character is supplied to the CandidateWordPriorityCalculator class via constructor parameter 'fromCharacterFrequencies'. The weight of this function is set by constructor parameter 'popularityOfChangeToCharacterWeight'.
This function looks at the 'from' and 'to' character pair in the current and candidate words, and prioritizes candidates from pairs which occur more frequently in adjacent words throughout the graph. For example, if we assume the following frequencies of substitutions...
...and current word 'turn' and candidate words 'burn' and 'turf' as above, 'burn' would be prioritized over 'turf', as substitution 't > b' occurs more frequently than 'n > f'. Again there could be some question as to whether this constitutes a true A* heuristic function, as it doesn't convey anything specifically about the relationship between the candidate and destination words. More importantly, the effectiveness of this function is questionable, as whilst it conveys how popular the substitution to move to each candidate word is, that doesn't necessarily help future traversal steps (and hence doesn't specifically help to move towards the destination word more quickly). I built this function routine very early in the development of A* Experiment before I realized its limited effectiveness. However, I decided to leave it in, as a point of comparison between the other functions. The frequency of character pair substitutions is supplied to class CandidateWordPriorityCalculator in constructor parameter 'characterSubstitutionFrequencies', and the weight of this function by parameter 'popularityOfCharacterChangeWeight'.
Different weights can be assigned to g(n) and the different heuristic functions through constructor parameters on class CandidateWordPriorityCalculator. For example, to assign even weight to g(n) and each of the 3 heuristic functions, the constructor should be called as follows...
Similarly, to evenly assign weight between g(n) and h(n), the constructor would be...
Running each of the 3 heuristic functions in isolation (i.e. with 100% weighting to only that heuristic function... and 0% g(n) ) with 4 sets of random source and destination words produces the following results ('Edges Explored' = the number of edges which had to be traversed to find the path and 'Path Length' = the number of vertices in that resulting path)...
Heuristic Function | Edges Explored | Path Length |
---|---|---|
Source word ='role', Destination word = 'band' | ||
Similarity between candidate word and destination word | 61 | 5 |
Popularity of the candidate word's 'change to' character | 3066 | 7 |
Popularity of character substitution pair between the current and candidate words | 2566 | 24 |
Source word ='pack', Destination word = 'sill' | ||
Similarity between candidate word and destination word | 51 | 5 |
Popularity of the candidate word's 'change to' character | 1193 | 14 |
Popularity of character substitution pair between the current and candidate words | 2424 | 26 |
Source word ='debt', Destination word = 'tyre' | ||
Similarity between candidate word and destination word | 102 | 9 |
Popularity of the candidate word's 'change to' character | 1591 | 18 |
Popularity of character substitution pair between the current and candidate words | 1886 | 28 |
Source word ='duct', Destination word = 'grid' | ||
Similarity between candidate word and destination word | 340 | 25 |
Popularity of the candidate word's 'change to' character | 6504 | 23 |
Popularity of character substitution pair between the current and candidate words | 7504 | 64 |
What's interesting is that the 'similarity between candidate word and destination word' function overwhelmingly requires fewer edge explorations than the other functions to find a path. This is likely due to the fact that it makes an assessment of the relationship between the candidate and destination words (unlike the other two which focus on the relationship between current and candidate, and candidate to candidate, as described above. Both the 'popularity of the candidate word's 'change to' character' and 'popularity of character substitution pair between the current and candidate words' functions explore roughly similar number of edges to find a path, but the 'popularity of the candidate word's 'change to' character' function finds significantly shorter paths. This tends to reinforce the assumption that the 'popularity of character substitution pair between the current and candidate words' function is not that effective. However one strength of A* is that it lets you blend prioritization of the heuristic function with g(n), and also blend different heuristic functions.
In A* Experiment there is definite potential value in blending heuristic functions. Whilst 'similarity between candidate word and destination word' is clearly optimal in isolation, in a graph of 4 letter words it can only have 5 possible values (i.e. 0, 1, 2, 3, or 4). This means that used in isolation, every candidate word can only be assigned one of 5 possible priority values, and will result in many candidates sharing the same priority. Hence using a blending of the 'popularity of the candidate word's 'change to' character' function will allow further sub-prioritization between two candidate words which share the same 'similarity between candidate word and destination word' score. Blending of heuristic functions is explored further below.
This is the 'shell' or 'harness' class that sets up all the underlying dependent classes, and runs the graph traversals using varying parameters, weightings, etc...
This class exposes methods to find paths through the graph using breadth-first search, Dijkstra's algorithm, and A*. Both methods FindPathViaAStar() and FindShortestPathViaDijkstrasAlgorithm() use a priority queue to hold candidate words which haven't been traversed to yet. Both methods also traverse through the graph in a similar way... i.e. by finding the words adjacent to the current word, and adding them to the priority queue (and then setting the current word by removing the highest priority word from the queue). A key difference though, is how each method calculates the priority... FindPathViaAStar() uses the CandidateWordPriorityCalculator member, which takes into account heuristic functions and weighting described above...
FindShortestPathViaDijkstrasAlgorithm() instead calculates candidate word priorities based on the distance from the source word...
Since the priority queue prioritizes lower numbers over higher, the second line above 'inverts' the distance by dividing by an arbitrarily big denominator (i.e. as long as the denominator is larger than any possible numerator, the distances from the source word will be prioritized accordingly... 1/3000 is higher priority than 2/3000).
The other key difference between the methods is the way they identify completion / exit criteria. FindPathViaAStar() checks whether each candidate word is the destination word, and stops as soon as it finds the destination...
...because of this FindPathViaAStar() will not always find the shortest path between the source and destination words. On the other hand FindShortestPathViaDijkstrasAlgorithm() is guaranteed to find the shortest path... it continues traversing until all edges leading from the destination have been traversed to (i.e. the destination vertex has been fully explored/visited)...
Another important point to note... since the graph is effectively unweighted, Dijkstra's algorithm ends up becoming equivalent to breadth-first search. Although FindShortestPathViaDijkstrasAlgorithm() uses a priority queue, as the traversal 'radiates' out from the source vertex and every edge has the same weight, candidate vertices will only ever be enqueued with effectively the 'Queue.MaxPriority()' value or 'Queue.MaxPriority() + 1'. Hence the order of the candidate vertices in the priority queue will be no different to the order had they been enqueued in a standard FIFO queue... the same data structure used by standard breadth-first search.
This class is used to calculate candidate word priority for A*. It takes all the aforementioned h(n) function weights as constructor parameters aswell as several data structures containing all the vertex words and statistics pertaining to the vertex words. In method CalculatePriority() individual g(n) and h(n) scores are calculated by protected methods CalculateSourceWordToCandidateWordDistancePriority(), CalculateNumberOfCharactersMatchingDestinationPriority(), etc... (which each return a score between 0.0 and 1.0) before the relevant weighting is applied to produce the overall weighting.
This class is used to find edges of the graph 'on the fly' during traversal, by recursing through a trie containing all the vertex words (in method FindAdjacentWords()).
This class implements the priority queue, using an underlying binary search tree (which sorts the elements/items by priority).
The source code doesn't include a word dictionary file, but the documented samples were run using the dictionary file at...
One downside of the above dictionary is that it tends to include a lot of acronyms (e.g. 'deat' = 'Department of Environmental Affairs and Tourism') and unusual (e.g. 'suet') and occasionally questionable words (e.g. 'sait'). Whichever word dictionary is used, the words need to be saved to a plain text file (one word per line), and constant 'dictionaryFilePath' in the AStarExperiment class set to the full file path...
The same class contains a variable 'wordFilterFunction' which is used to filter the dictionary to only four letter words, and to ignore any words containing punctuation, whitespace, non-printing characters, etc... The same filter function could be modified to allow only three or five letter words (or similar).
In the main Program class, uncomment the following lines to run the path finding...
The following will be output to the console...
The results of different path searches for 4 different pairs of source and destination words are summarized in the tables below. The column abbreviations are as follows...
BFS = Breadth-first search
Dijkstra = Dijkstra's shortest path algorithm
2dBFS = Bidirectional breadth-first search
A* (x% x% x% x%) = A* where the 4 percentages represent the weighting of g(n), 'similarity between candidate word and destination word', 'popularity of the candidate word's 'change to' character', and 'popularity of character substitution pair between the current and candidate words' respectively.
Path | BFS | Dijkstra | 2dBFS | A* (50% 16.6% 16.6% 16.6%) | A* (0% 33% 33% 33%) | A* (25% 25% 25% 25%) | A* (25% 50% 25% 0%) |
---|---|---|---|---|---|---|---|
'role' > 'band' | 1516 | 4675 | 164 | 400 | 343 | 300 | 51 |
'pack' > 'sill' | 3330 | 7815 | 205 | 62 | 62 | 62 | 48 |
'debt' > 'tyre' | 8739 | 12207 | 319 | 87 | 87 | 87 | 148 |
'duct' > 'grid' | 10541 | 12119 | 384 | 3331 | 1077 | 3675 | 83 |
Traversal Algorithm | Path | Path Vertex Count | Edges Explored |
---|---|---|---|
Source word ='role', Destination word = 'band' | |||
BFS | role > bole > bale > bald > band | 5 | 1516 |
Dijkstra | role > bole > bale > bald > band | 5 | 4675 |
2dBFS | role > bole > bold > bond > band | 5 | 164 |
A* (50% 16.6% 16.6% 16.6%) | role > rote > rate > sate > sane > sand > band | 7 | 400 |
A* (0% 33% 33% 33%) | role > rote > rate > late > lane > land > band | 7 | 343 |
A* (25% 25% 25% 25%) | role > rote > rate > late > lane > land > band | 7 | 300 |
A* (25% 50% 25% 0%) | role > bole > bold > bald > band | 5 | 51 |
Source word ='pack', Destination word = 'sill' | |||
BFS | pack > pick > sick > silk > sill | 5 | 3330 |
Dijkstra | pack > pick > sick > silk > sill | 5 | 7815 |
2dBFS | pack > sack > sick > silk > sill | 5 | 205 |
A* (50% 16.6% 16.6% 16.6%) | pack > pick > sick > silk > sill | 5 | 62 |
A* (0% 33% 33% 33%) | pack > pick > sick > silk > sill | 5 | 62 |
A* (25% 25% 25% 25%) | pack > pick > sick > silk > sill | 5 | 62 |
A* (25% 50% 25% 0%) | pack > sack > sick > silk > sill | 5 | 48 |
Source word ='debt', Destination word = 'tyre' | |||
BFS | debt > deat > peat > pert > pere > pyre > tyre | 7 | 8739 |
Dijkstra | debt > deat > peat > pert > pere > pyre > tyre | 7 | 12207 |
2dBFS | debt > deat > peat > pert > pere > pyre > tyre | 7 | 319 |
A* (50% 16.6% 16.6% 16.6%) | debt > dent > tent > tant > tart > tare > tyre | 7 | 87 |
A* (0% 33% 33% 33%) | debt > dent > tent > tant > tart > tare > tyre | 7 | 87 |
A* (25% 25% 25% 25%) | debt > dent > tent > tant > tart > tare > tyre | 7 | 87 |
A* (25% 50% 25% 0%) | debt > deat > teat > team > term > tera > tara > tare > tyre | 9 | 148 |
Source word ='duct', Destination word = 'grid' | |||
BFS | duct > dust > bust > bast > bait > brit > grit > grid | 8 | 10541 |
Dijkstra | duct > dust > bust > bast > bait > brit > grit > grid | 8 | 12119 |
2dBFS | duct > duet > suet > suit > sait > gait > grit > grid | 8 | 384 |
A* (50% 16.6% 16.6% 16.6%) | duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid | 10 | 3331 |
A* (0% 33% 33% 33%) | duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid | 15 | 1077 |
A* (25% 25% 25% 25%) | duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid | 15 | 3675 |
A* (25% 50% 25% 0%) | duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid | 10 | 83 |
A few of my own observations and assumptions. Just to make discussion of the different h(n) functions easier to follow, I'll use the following abbreviations for those...
CandDestSim = similarity between candidate word and destination word
CandChangeToPop = popularity of the candidate word's 'change to' character
SubstitutionPop = popularity of character substitution pair between the current and candidate words
I'm happy to hear any thoughts / comments / feedback, so feel free to contact me.