A* Experiment

A* is a popular traversal algorithm used for finding paths between vertices (nodes) in a graph. It's also a topic which is often asked about in more advanced coding interviews, and in interviews for certain relevant areas of software engineering (gaming, social networks, search, etc...). This project gives a sample implementation of A*, and looks at how different heuristic functions can affect the speed of finding a path through the graph.

Terminology

I'll use the following terminology throughout the rest of the article...

Term Meaning
Vertex A point within a graph which represents some data entity. It's also often alternately referred to as a node. As the sample graph in the project is a graph of adjacent words (see below), I often use 'word' to mean a vertex in the code and documentation (source word, destination word, adjacent word, etc...).
Edge A direct path between two vertices.
Source vertex (or 'source word') - The 'start' vertex in a path through the graph.
Destination vertex (or 'destination word') - The 'end' vertex in a path through the graph.
Current vertex (or 'current word') - A vertex which is currently being traversed while finding a path.
Candidate vertex (or 'candidate word') - A vertex adjacent to the current vertex which is assessed for 1) whether it should be traversed to, and 2) the priority for traversing to it (as compared to other candidate vertices).

Graph Traversal Basics

Breadth-first Search

Breadth-first search is (along with depth-first search) one of the fundamental graph traversal algorithms. It works by 'radiating' outward from the source vertex... i.e. first all the vertices directly adjacent to the source vertex are traversed, then all the vertices two edges away from the source, and so on until the destination vertex is found.

Bidirectional Breadth-first Search

This uses the same approach as for standard breadth-first search, but runs two traversal processes in parallel... one radiating out from the source vertex and one radiating from the destination. All the vertices encountered in each process are recorded, and when a vertex is found which has already been encountered by the other process, a path between the source and destination is found. Typically bidirectional breadth-first search is significantly faster at finding a path than standard breadth-first search, as the total number of vertices traversed to find a path is far less. This is because the 'radiating' process of traversal grows exponentially as the process gets further from the initial vertex, so keeping that distance from the initial vertex short (by running two parallel processes) results in far less vertices needing to be traversed. The Wikipedia article for bidirectional search explains this in terms of big O notation.

Dijkstra's Shortest Path Algorithm

Dijkstra's algorithm is able to find the shortest path between a source and destination vertex, and is best suited to weighted graphs, where different edges have a different cost to traverse (e.g. as would be the case with a graph of roads between cities, where each road has a different distance). It's base implementation is typically similar to breadth-first search, in that it uses a queue to store the candidate vertices to traverse to next. In breadth-first search this queue is a simple FIFO queue, and hence the vertices are traversed in order depending on their distance (in terms of number of edges) from the source. In later variations of Dijkstra's algorithm, instead a priority queue is used, and the weighting of edges is taken into account... i.e. vertices whose total distance from the source (taking edge weight into account) is less are traversed to before vertices whose total distance is greater. It also maintains a data structure recording the shortest distance found so far from the source to each encountered vertex. Finally, in order to guarantee it has found the shortest path, it does not stop traversing when the destination vertex is first encountered... but continues until all vertices adjacent to the destination have been traversed to.

A*

A* is somewhat like an enhancement to Dijkstra's algorithm, which uses a heuristic function in addition to the distance from the source vertex, to better prioritize where to traverse to next, and to find a path between the source and destination vertices more quickly. If you look at the animation on the A* Wikipedia page, you'll see that the algorithm seems to 'know' where the destination vertex is, and seems to 'seek' towards it (unlike the similar animation for Dijkstra's algorithm which more 'radiates' evenly from the source vertex)... and this is exactly what the heuristic function allows A* to do.

The heuristic function generally works by providing additional information on the relationship between a candidate vertex and the destination vertex. In graphs representing maps of cities and roads, a heuristic function could give the straight line distance ('as the crow flies') between each candidate city and the destination... then candidates further from the destination would be given lower priority for traversal. This provides an advantage over Dijkstra's algorithm, as candidate cities whose path (by road) from the source was short, but were far (in a straight line) from the destination would be given lower priority for traversal. Another example in the city map context could be to have a heuristic function which calculates the compass direction between each candidate city and the destination... then candidate cities which are in similar compass directions as the destination could be prioritized over cities which are in a different direction.

Sample Graph and Context

Graph

In the graph in A* Experiment, each vertex holds a four letter word. Edges connect words which are adjacent, that is, words which differ by one letter (for example 'barn' and 'burn'). A small example portion of the graph appears below...

Graph sample

A path between words 'purr' and 'bark' in the above sample would be...

purr > burr > burn > born > barn > bark

The shortest path between the same words is...

purr > burr > burn > barn > bark

Using the recommended source word dictionary, the entire graph contains just over 3000 vertices, and almost 13000 edges. However, one key difference between usual methods of representing graphs, and the one used in A* Experiment, is the the entire graph is not pre-built (e.g. as they would usually be with an adjacency list or similar). All the words in the graph are stored in a character trie, and edges calculated 'on the fly' (i.e. during graph traversal) via an efficient search through the trie (this is performed by method CharacterTrieUtilities.FindAdjacentWords()).

Since an edge always joins two words which differ by one character, the weights of all edges are the same (i.e. the graph is not weighted). This means the benefits of Dijkstra's algorithm over breadth-first search are not as apparent as they would be in a weighted graph.

Heuristics

When prioritizing candidate vertices during traversal, A* considers two components in determining the priority... g(n) and h(n). g(n) is the distance from the source vertex (with lower distance giving higher priority), which is the same prioritization method as the later variations of Dijkstra's algorithm outlined above. h(n) is the heuristic function which gives some additional information about the relationship between the candidate vertex and the destination. In A* Experiment there are 3 separate heuristic functions in addition to g(n), and all can be assigned different weightings to give an overall candidate vertex priority. Weightings of the g(n) and h(n) functions are controlled by constructor parameters to the CandidateWordPriorityCalculator class. The g(n) weighting is set by parameter 'sourceWordToCandidateWordDistanceWeight'. The available heuristic functions are...

Similarity between candidate word and destination word

This function looks at the number of matching characters (i.e. same characters in the same position) between the candidate and destination words. For example for candidate words 'burn' and 'turf', and destination 'horn', 'burn' would be prioritzed over 'turf' as it has 2 characters in common with 'horn', as opposed to 1 for 'turf'. This function's weight is set by constructor parameter 'numberOfCharactersMatchingDestinationWeight'.

Popularity of the candidate word's 'change to' character

This function compares the character that is changed between the current word and the candidate word (the character that is 'changed to'), and prioritizes characters that occur more frequently as 'changed from' characters throughout the graph. For example, for current word 'turn', and candidate words 'burn' and 'turf', and assuming the following frequencies of 'changed from' characters in adjacent words...

a: 1357 b: 1126 c: 993 d: 1431 e: 1342 f: 860 g: 966

...'burn' would be prioritized over 'turf', as 'b' is more frequently substituted ('changed from') in adjacent words than 'f'. The thinking behind this is that moving to words with characters that are more frequently the 'changed from' character in adjacent words, should provide more possible candidate words in future traversal steps, and help to get to the destination more quickly. Strictly speaking this may not constitute a true A* heuristic function, as whilst it can help to move through the graph more quickly, it doesn't say anything specifically about the relationship between the candidate and destination. The frequency of change in adjacent words for each character is supplied to the CandidateWordPriorityCalculator class via constructor parameter 'fromCharacterFrequencies'. The weight of this function is set by constructor parameter 'popularityOfChangeToCharacterWeight'.

Popularity of character substitution pair between the current and candidate words

This function looks at the 'from' and 'to' character pair in the current and candidate words, and prioritizes candidates from pairs which occur more frequently in adjacent words throughout the graph. For example, if we assume the following frequencies of substitutions...

m > u: 5 n > f: 48 n > i: 21 p > h: 67 p > t: 119 t > b: 95

...and current word 'turn' and candidate words 'burn' and 'turf' as above, 'burn' would be prioritized over 'turf', as substitution 't > b' occurs more frequently than 'n > f'. Again there could be some question as to whether this constitutes a true A* heuristic function, as it doesn't convey anything specifically about the relationship between the candidate and destination words. More importantly, the effectiveness of this function is questionable, as whilst it conveys how popular the substitution to move to each candidate word is, that doesn't necessarily help future traversal steps (and hence doesn't specifically help to move towards the destination word more quickly). I built this function routine very early in the development of A* Experiment before I realized its limited effectiveness. However, I decided to leave it in, as a point of comparison between the other functions. The frequency of character pair substitutions is supplied to class CandidateWordPriorityCalculator in constructor parameter 'characterSubstitutionFrequencies', and the weight of this function by parameter 'popularityOfCharacterChangeWeight'.

Comparison of Heuristics

Weighting Setup

Different weights can be assigned to g(n) and the different heuristic functions through constructor parameters on class CandidateWordPriorityCalculator. For example, to assign even weight to g(n) and each of the 3 heuristic functions, the constructor should be called as follows...

Int32 sourceWordToCandidateWordDistanceWeight = 1; Int32 numberOfCharactersMatchingDestinationWeight = 1; Int32 popularityOfChangeToCharacterWeight = 1; Int32 popularityOfCharacterChangeWeight = 1; CandidateWordPriorityCalculator priorityCalculator = new CandidateWordPriorityCalculator(maximumSourceWordToCandidateWordDistance, sourceWordToCandidateWordDistanceWeight, numberOfCharactersMatchingDestinationWeight, popularityOfChangeToCharacterWeight, popularityOfCharacterChangeWeight, allWordsTrieRoot, allWords, fromCharacterFrequencies, characterSubstitutionFrequencies);

Similarly, to evenly assign weight between g(n) and h(n), the constructor would be...

Int32 sourceWordToCandidateWordDistanceWeight = 3; Int32 numberOfCharactersMatchingDestinationWeight = 1; Int32 popularityOfChangeToCharacterWeight = 1; Int32 popularityOfCharacterChangeWeight = 1; CandidateWordPriorityCalculator priorityCalculator = new CandidateWordPriorityCalculator(maximumSourceWordToCandidateWordDistance, sourceWordToCandidateWordDistanceWeight, numberOfCharactersMatchingDestinationWeight, popularityOfChangeToCharacterWeight, popularityOfCharacterChangeWeight, allWordsTrieRoot, allWords, fromCharacterFrequencies, characterSubstitutionFrequencies);

Heuristic Functions in Isolation

Running each of the 3 heuristic functions in isolation (i.e. with 100% weighting to only that heuristic function... and 0% g(n) ) with 4 sets of random source and destination words produces the following results ('Edges Explored' = the number of edges which had to be traversed to find the path and 'Path Length' = the number of vertices in that resulting path)...

Heuristic Function Edges Explored Path Length
Source word ='role', Destination word = 'band'
Similarity between candidate word and destination word 61 5
Popularity of the candidate word's 'change to' character 3066 7
Popularity of character substitution pair between the current and candidate words 2566 24
Source word ='pack', Destination word = 'sill'
Similarity between candidate word and destination word 51 5
Popularity of the candidate word's 'change to' character 1193 14
Popularity of character substitution pair between the current and candidate words 2424 26
Source word ='debt', Destination word = 'tyre'
Similarity between candidate word and destination word 102 9
Popularity of the candidate word's 'change to' character 1591 18
Popularity of character substitution pair between the current and candidate words 1886 28
Source word ='duct', Destination word = 'grid'
Similarity between candidate word and destination word 340 25
Popularity of the candidate word's 'change to' character 6504 23
Popularity of character substitution pair between the current and candidate words 7504 64

What's interesting is that the 'similarity between candidate word and destination word' function overwhelmingly requires fewer edge explorations than the other functions to find a path. This is likely due to the fact that it makes an assessment of the relationship between the candidate and destination words (unlike the other two which focus on the relationship between current and candidate, and candidate to candidate, as described above. Both the 'popularity of the candidate word's 'change to' character' and 'popularity of character substitution pair between the current and candidate words' functions explore roughly similar number of edges to find a path, but the 'popularity of the candidate word's 'change to' character' function finds significantly shorter paths. This tends to reinforce the assumption that the 'popularity of character substitution pair between the current and candidate words' function is not that effective. However one strength of A* is that it lets you blend prioritization of the heuristic function with g(n), and also blend different heuristic functions.

In A* Experiment there is definite potential value in blending heuristic functions. Whilst 'similarity between candidate word and destination word' is clearly optimal in isolation, in a graph of 4 letter words it can only have 5 possible values (i.e. 0, 1, 2, 3, or 4). This means that used in isolation, every candidate word can only be assigned one of 5 possible priority values, and will result in many candidates sharing the same priority. Hence using a blending of the 'popularity of the candidate word's 'change to' character' function will allow further sub-prioritization between two candidate words which share the same 'similarity between candidate word and destination word' score. Blending of heuristic functions is explored further below.

Included Classes

AStarExperiment

This is the 'shell' or 'harness' class that sets up all the underlying dependent classes, and runs the graph traversals using varying parameters, weightings, etc...

AdjacentWordGraphPathFinder

This class exposes methods to find paths through the graph using breadth-first search, Dijkstra's algorithm, and A*. Both methods FindPathViaAStar() and FindShortestPathViaDijkstrasAlgorithm() use a priority queue to hold candidate words which haven't been traversed to yet. Both methods also traverse through the graph in a similar way... i.e. by finding the words adjacent to the current word, and adding them to the priority queue (and then setting the current word by removing the highest priority word from the queue). A key difference though, is how each method calculates the priority... FindPathViaAStar() uses the CandidateWordPriorityCalculator member, which takes into account heuristic functions and weighting described above...

Double candidateWordPriority = priorityCalculator.CalculatePriority(currentWord, currentCandidateWord, destinationWord, distanceFromSourceToCurrentVertex + 1);

FindShortestPathViaDijkstrasAlgorithm() instead calculates candidate word priorities based on the distance from the source word...

Int32 currentDistanceToCandidateWord = distanceFromSourceToCurrentVertex + 1; Double currentCandidateWordPriority = Convert.ToDouble(currentDistanceToCandidateWord) / priorityDenominator;

Since the priority queue prioritizes lower numbers over higher, the second line above 'inverts' the distance by dividing by an arbitrarily big denominator (i.e. as long as the denominator is larger than any possible numerator, the distances from the source word will be prioritized accordingly... 1/3000 is higher priority than 2/3000).

The other key difference between the methods is the way they identify completion / exit criteria. FindPathViaAStar() checks whether each candidate word is the destination word, and stops as soon as it finds the destination...

foreach (String currentCandidateWord in trieUtilities.FindAdjacentWords(wordDictionaryTrieRoot, currentWord)) { if (currentCandidateWord.Equals(destinationWord)) { previousVertices.Add(currentCandidateWord, currentWord); numberOfEdgesExplored++; pathFound = true; break; }

...because of this FindPathViaAStar() will not always find the shortest path between the source and destination words. On the other hand FindShortestPathViaDijkstrasAlgorithm() is guaranteed to find the shortest path... it continues traversing until all edges leading from the destination have been traversed to (i.e. the destination vertex has been fully explored/visited)...

// If the current word is the destination word, the shortest path has been found if (currentWord.Equals(destinationWord)) break;

Another important point to note... since the graph is effectively unweighted, Dijkstra's algorithm ends up becoming equivalent to breadth-first search. Although FindShortestPathViaDijkstrasAlgorithm() uses a priority queue, as the traversal 'radiates' out from the source vertex and every edge has the same weight, candidate vertices will only ever be enqueued with effectively the 'Queue.MaxPriority()' value or 'Queue.MaxPriority() + 1'. Hence the order of the candidate vertices in the priority queue will be no different to the order had they been enqueued in a standard FIFO queue... the same data structure used by standard breadth-first search.

CandidateWordPriorityCalculator

This class is used to calculate candidate word priority for A*. It takes all the aforementioned h(n) function weights as constructor parameters aswell as several data structures containing all the vertex words and statistics pertaining to the vertex words. In method CalculatePriority() individual g(n) and h(n) scores are calculated by protected methods CalculateSourceWordToCandidateWordDistancePriority(), CalculateNumberOfCharactersMatchingDestinationPriority(), etc... (which each return a score between 0.0 and 1.0) before the relevant weighting is applied to produce the overall weighting.

CharacterTrieUtilities

This class is used to find edges of the graph 'on the fly' during traversal, by recursing through a trie containing all the vertex words (in method FindAdjacentWords()).

PriorityQueue

This class implements the priority queue, using an underlying binary search tree (which sorts the elements/items by priority).

Running

The source code doesn't include a word dictionary file, but the documented samples were run using the dictionary file at...

One downside of the above dictionary is that it tends to include a lot of acronyms (e.g. 'deat' = 'Department of Environmental Affairs and Tourism') and unusual (e.g. 'suet') and occasionally questionable words (e.g. 'sait'). Whichever word dictionary is used, the words need to be saved to a plain text file (one word per line), and constant 'dictionaryFilePath' in the AStarExperiment class set to the full file path...

// The path to a file containing a dictionary of words const String dictionaryFilePath = @"C:\Temp\words2.txt";

The same class contains a variable 'wordFilterFunction' which is used to filter the dictionary to only four letter words, and to ignore any words containing punctuation, whitespace, non-printing characters, etc... The same filter function could be modified to allow only three or five letter words (or similar).

Func<String, Boolean> wordFilterFunction = new Func<String, Boolean>((inputString) => { foreach (Char currentCharacter in inputString) { if (Char.IsLetter(currentCharacter) == false) return false; } if (inputString.Length == 4) return true; else return false; });

In the main Program class, uncomment the following lines to run the path finding...

AStarExperiment aStarExperiment = new AStarExperiment(); aStarExperiment.Run();

The following will be output to the console...

----------------------------------------------- Finding paths for strings 'role' and 'band' ----------------------------------------------- Using breadth-first search... Path: role > bole > bale > bald > band Edges explored: 1516 Using Dijkstra's algorithm... Path: role > bole > bale > bald > band Edges explored: 4675 Using bidirectional breadth-first search... Path: role > bole > bold > bond > band Edges explored: 164 Using A* ( 50% g(n) and 50% h(n) )... Path: role > rote > rate > sate > sane > sand > band Edges explored: 400 Using A* ( 0% g(n) and 100% h(n) )... Path: role > rote > rate > late > lane > land > band Edges explored: 343 Using A* ( 25% g(n) and 75% h(n) )... Path: role > rote > rate > late > lane > land > band Edges explored: 300 Using A* ( 25% g(n) and 75% h(n) with custom h(n) weighting )... Path: role > bole > bold > bald > band Edges explored: 51 ----------------------------------------------- Finding paths for strings 'pack' and 'sill' ----------------------------------------------- Using breadth-first search... Path: pack > pick > sick > silk > sill Edges explored: 3330 Using Dijkstra's algorithm... Path: pack > pick > sick > silk > sill Edges explored: 7815 Using bidirectional breadth-first search... Path: pack > sack > sick > silk > sill Edges explored: 205 Using A* ( 50% g(n) and 50% h(n) )... Path: pack > pick > sick > silk > sill Edges explored: 62 Using A* ( 0% g(n) and 100% h(n) )... Path: pack > pick > sick > silk > sill Edges explored: 62 Using A* ( 25% g(n) and 75% h(n) )... Path: pack > pick > sick > silk > sill Edges explored: 62 Using A* ( 25% g(n) and 75% h(n) with custom h(n) weighting )... Path: pack > sack > sick > silk > sill Edges explored: 48 ----------------------------------------------- Finding paths for strings 'debt' and 'tyre' ----------------------------------------------- Using breadth-first search... Path: debt > deat > peat > pert > pere > pyre > tyre Edges explored: 8739 Using Dijkstra's algorithm... Path: debt > deat > peat > pert > pere > pyre > tyre Edges explored: 12207 Using bidirectional breadth-first search... Path: debt > deat > peat > pert > pere > pyre > tyre Edges explored: 319 Using A* ( 50% g(n) and 50% h(n) )... Path: debt > dent > tent > tant > tart > tare > tyre Edges explored: 87 Using A* ( 0% g(n) and 100% h(n) )... Path: debt > dent > tent > tant > tart > tare > tyre Edges explored: 87 Using A* ( 25% g(n) and 75% h(n) )... Path: debt > dent > tent > tant > tart > tare > tyre Edges explored: 87 Using A* ( 25% g(n) and 75% h(n) with custom h(n) weighting )... Path: debt > deat > teat > team > term > tera > tara > tare > tyre Edges explored: 148 ----------------------------------------------- Finding paths for strings 'duct' and 'grid' ----------------------------------------------- Using breadth-first search... Path: duct > dust > bust > bast > bait > brit > grit > grid Edges explored: 10541 Using Dijkstra's algorithm... Path: duct > dust > bust > bast > bait > brit > grit > grid Edges explored: 12119 Using bidirectional breadth-first search... Path: duct > duet > suet > suit > sait > gait > grit > grid Edges explored: 384 Using A* ( 50% g(n) and 50% h(n) )... Path: duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid Edges explored: 3331 Using A* ( 0% g(n) and 100% h(n) )... Path: duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid Edges explored: 1077 Using A* ( 25% g(n) and 75% h(n) )... Path: duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid Edges explored: 3675 Using A* ( 25% g(n) and 75% h(n) with custom h(n) weighting )... Path: duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid Edges explored: 83

Results and Observations

The results of different path searches for 4 different pairs of source and destination words are summarized in the tables below. The column abbreviations are as follows...

BFS = Breadth-first search
Dijkstra = Dijkstra's shortest path algorithm
2dBFS = Bidirectional breadth-first search
A* (x% x% x% x%) = A* where the 4 percentages represent the weighting of g(n), 'similarity between candidate word and destination word', 'popularity of the candidate word's 'change to' character', and 'popularity of character substitution pair between the current and candidate words' respectively.

Number of Graph Edges Explored for Different Traversal Algorithms

Path BFS Dijkstra 2dBFS A* (50% 16.6% 16.6% 16.6%) A* (0% 33% 33% 33%) A* (25% 25% 25% 25%) A* (25% 50% 25% 0%)
'role' > 'band' 1516 4675 164 400 343 300 51
'pack' > 'sill' 3330 7815 205 62 62 62 48
'debt' > 'tyre' 8739 12207 319 87 87 87 148
'duct' > 'grid' 10541 12119 384 3331 1077 3675 83

Paths Found for Different Traversal Algorithms

Traversal Algorithm Path Path Vertex Count Edges Explored
Source word ='role', Destination word = 'band'
BFS role > bole > bale > bald > band 5 1516
Dijkstra role > bole > bale > bald > band 5 4675
2dBFS role > bole > bold > bond > band 5 164
A* (50% 16.6% 16.6% 16.6%) role > rote > rate > sate > sane > sand > band 7 400
A* (0% 33% 33% 33%) role > rote > rate > late > lane > land > band 7 343
A* (25% 25% 25% 25%) role > rote > rate > late > lane > land > band 7 300
A* (25% 50% 25% 0%) role > bole > bold > bald > band 5 51
Source word ='pack', Destination word = 'sill'
BFS pack > pick > sick > silk > sill 5 3330
Dijkstra pack > pick > sick > silk > sill 5 7815
2dBFS pack > sack > sick > silk > sill 5 205
A* (50% 16.6% 16.6% 16.6%) pack > pick > sick > silk > sill 5 62
A* (0% 33% 33% 33%) pack > pick > sick > silk > sill 5 62
A* (25% 25% 25% 25%) pack > pick > sick > silk > sill 5 62
A* (25% 50% 25% 0%) pack > sack > sick > silk > sill 5 48
Source word ='debt', Destination word = 'tyre'
BFS debt > deat > peat > pert > pere > pyre > tyre 7 8739
Dijkstra debt > deat > peat > pert > pere > pyre > tyre 7 12207
2dBFS debt > deat > peat > pert > pere > pyre > tyre 7 319
A* (50% 16.6% 16.6% 16.6%) debt > dent > tent > tant > tart > tare > tyre 7 87
A* (0% 33% 33% 33%) debt > dent > tent > tant > tart > tare > tyre 7 87
A* (25% 25% 25% 25%) debt > dent > tent > tant > tart > tare > tyre 7 87
A* (25% 50% 25% 0%) debt > deat > teat > team > term > tera > tara > tare > tyre 9 148
Source word ='duct', Destination word = 'grid'
BFS duct > dust > bust > bast > bait > brit > grit > grid 8 10541
Dijkstra duct > dust > bust > bast > bait > brit > grit > grid 8 12119
2dBFS duct > duet > suet > suit > sait > gait > grit > grid 8 384
A* (50% 16.6% 16.6% 16.6%) duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid 10 3331
A* (0% 33% 33% 33%) duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid 15 1077
A* (25% 25% 25% 25%) duct > dust > lust > last > list > lilt > tilt > till > tall > toll > told > gold > goad > grad > grid 15 3675
A* (25% 50% 25% 0%) duct > dust > gust > gest > gelt > geld > gold > goad > grad > grid 10 83

Observations and Assumptions

A few of my own observations and assumptions. Just to make discussion of the different h(n) functions easier to follow, I'll use the following abbreviations for those...

CandDestSim = similarity between candidate word and destination word
CandChangeToPop = popularity of the candidate word's 'change to' character
SubstitutionPop = popularity of character substitution pair between the current and candidate words

  • Standard breadth-first search and Dijkstra's algorithm always find the same path. This is because (as discussed above) the lack of weighting in the graph means that Dijkstra's algorithm becomes equivalent to BFS. Dijkstra's algorithm tends to explore somewhere between 1.2 to 3 times as many edges to find a path due to it doing additional traversal to fully explore the destination vertex (also discussed above). In a properly weighted graph the benefits of Dijkstra's algorithm would be apparent... in that it's guaranteed to find the shortest path between two vertices. This implementation of BFS will just return the first path it finds... which in a weighted graph could be far more costly than the shortest path.
  • Bidirectional breadth-first search is very efficient compared to standard BFS, requiring at least an order of magnitude less edge explorations to find a path. Given the graph is not weighted, it will also find a path equivalent to the shortest length between the vertices (although not necessarily the same path as Dijkstra or standard BFS due to the different traversal mechanism).
  • In most cases some blending of g(n) and heuristic functions gives a better result than the heuristic functions in isolation. Although the results for CandDestSim in isolation are very strong, they are beaten (in terms of edges explored) in all cases by some blended combination of g(n) and heuristics. In terms of length of path found, blending of g(n) and h(n) tends to give much better results than individual heuristics in isolation... particularly for words pairs which are further away (i.e. 'debt' > 'type' and 'duct' > 'grid').
  • As hypothesized above, the SubstitutionPop function doesn't seem very effective, and in all but one case paths are found more quickly (sometimes significantly so) when its weighting is set to 0 (especially the case for 'role' > 'band' and 'duct' > 'grid')... although there is one case ('debt' > 'tyre') where its removal seems to produce a worse result.
  • Although A* generally produces better results than BFS and Dijkstra, and within A*, a blending of g(n) and several heuristics gives better results than a single heuristic in isolation, there is no 'magic bullet' blending that works best in every case. The A* (25% 50% 25% 0%) blending seems to be the best choice of traversal algorithms here, but there is one case ('debt' > 'tyre') where other A* blendings give better results. It seems in a real-world case you'd need to select a 'best fit' blending of g(n) and h(n) which gave consistently good (but not always best) results. Alternately you may be able to vary the A* blending for each different path search depending on assumptions about the source and destination words.
  • The sample graph in A* Experiment contains just ~3000 vertices which is very small compared to real-world contexts (e.g. a social network graph with hundreds of millions of vertices). In this case bidirectional BFS seems like a pretty attractive option in terms of consistently finding a short path quickly. However in a very large graph, the explored edge count could increase significantly, and the benefit of A* in that case may become more pronounced.
  • Another consideration for real-world contexts is around the heuristic functions. The CandDestSim function provides a very strong quantifiable heuristic, as it tells us something about candidate words which are extremely likely to get us closer to the destination word (although this may not be the case in graphs which with a lesser degree of connectivity... e.g. graphs of 5, 6, or 7 letter words). Think about how you would define such a strong heuristic for a social network... you could use commonality between the source person and destination person such as interests, area of study, area of work, but would these allow you to move so strongly towards a random destination person living on the other side of the world to the source (as the CandDestSim function does for four letter words)?

Contact

I'm happy to hear any thoughts / comments / feedback, so feel free to contact me.