Creating a Profile Matrix

Previously I have written about the Greedy Motif Search. There I got a question about creating the profile matrix. Let’s take a look at how it is calculated.

What is the profile matrix?

It is a matrix which is calculated on the number of nucleotides in each string at the given position. The nucleotide-count adds up to 1 in each column.

To be more clear let’s see an example:

In the example above we see a list of nucleotides. If we generate the profile we get the following result:
A: 0.2, 0.8, 0.4, 0.2, 0.4, 0.4, 0.2, 0.8, 0.0, 0.0, 0.2, 0.4
C: 0.6, 0.0, 0.4, 0.0, 0.0, 0.2, 0.4, 0.0, 0.0, 0.4, 0.6, 0.4
G: 0.2, 0.2, 0.2, 0.6, 0.2, 0.0, 0.2, 0.0, 0.4, 0.2, 0.2, 0.2
T: 0.0, 0.0, 0.0, 0.2, 0.4, 0.4, 0.2, 0.2, 0.6, 0.4, 0.0, 0.0
 As you can see, the profile is the amount of each nucleotide in the given position divided by the number of nucleotide strings provided.
If we want to code a profile generator in Python we could do it like this function:
def generate_profile(motifs):
    k = len(motifs[0])
    profile = {'A': [0] * k, 'C': [0] * k, 'G': [0] * k, 'T': [0] * k}
    div = float(len(motifs))
    for i in range(k):
        for motif in motifs:
            profile[motif[i]][i] += 1
        for key in profile:
            profile[key][i] /= div
    return profile
 This function returns the profile matrix as a dictionary with the nucleotide as the key and the profile score as the value.
This was easy. Now you can use this function to generate the profile for a Greedy Motif Search.

One thought on “Creating a Profile Matrix

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s