Calculating the Erdős Number

One of the community challenges of CodinGame is to calculate the Erdős Number:

The Erdős number describes the minimum “collaborative distance” between the scientist Paul Erdős and another person, as measured by co-authored publications. The Hungarian mathematician is the most prolific author of his field since he wrote more than 1500 papers in his lifetime.

If a person has an Erdős number of 1, it means that he has written a research paper with Erdős; a Erdős number equal to 2 means that he co-authored an article with a direct collaborator of Erdős but not with Erdős himself, etc.

If a person is not a direct or indirect collaborator with Erdős, their Erdős number is infinite.

From a list of publications and their authors, you need to give the Erdős number of the scientist given in the input. If this number is not zero or infinity, you must output the list of ordered article titles (from the scientist’s paper to the paper written by Erdős).

My approach was to calculate the Dijkstra-distance between the scientists and Erdős using Python 3.

The idea of the solution is to represent the authors of articles as graphs. Graphs can be represented with adjacency matrices the best, which can be implemented as 2 dimensional lists. However in this case we deal with names, so a 2D list is not useful but we can create a dictionary where each key has a set of values representing the edges where the vertex (the key) is connected.

So when reading in a list of authors, make sure to add the other authors for each one to the graph. For example if the authors are: Adams, Eggleton, MacDougall and Mahmodian, then add Eggleton, MacDougall, Mahmodian as values to the key Adams; Adams, MacDougall, Mahmodian to the key of Eggleton and so on.

After the graph was ready I implemented Dijkstra’s algorithm. Implementing the algorithm was easy because there are plenty of resources available on the internet — even using Python 3, but I sticked with the pseudo code from Wikipedia. I included the optimisation too because if we find the target we do not need to work on the remaining vertices.

Here is my code:

def dijkstra(graph, source, target=None):
    dist = {}
    prev = {}
    visited = set()
    for v in graph:
        dist[v] = float('inf')
        prev[v] = None
    dist = 0

while sum(1 for k,v in dist.items() if k not in visited and v < float('inf')):
    u = [v for v in sorted(dist, key=dist.get) if v not in visited][0]
    if u in visited: continue
    visited.add(u)
    if u == target: break
    for v in graph[u]:
        alt = dist[u] + 1
        if alt < dist[v]:
            dist[v] = alt
            prev[v] = u
return dist, prev

The algorithm works well, it returns the right distance and the path between the source and target. What took me the most time was that I got back the right distance, but the article names would not come out properly: either I have had the wrong article or my entry in the list of previous names was not found.

The problem was that I started to go from the scientist provided by the exercise towards Erdős but the right answer came after I switched the direction and went from Erdős to the target scientist. With this approach I have finished the exercise.

You can find the algorithm at my GitHub repository too.

My conclusion from this exercise is: always visualise your problem (or a subset of the problem) to see where can you be wrong.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s