Sunday, July 18, 2021

How does Bowtie2 calculate a minimum alignment score?

 



The Bowtie2 manual does a fine job at explaining what the minimum score is and how it is computed. So consider this blog post a supplement to the manual, if you wish to be generous. Otherwise, denounce it as redundant. 

For relevant parts of the manual see the sections on (i) scoring, and particularly the sub-section therein on valid alignments; (ii) the section on setting function options ; and (iii) the descriptions of scoring options in the options section.

It may also be useful to review the following subsections in my aging blog on how Bowtie2 computes MAPQ scores (for single-end reads in end-to-end mode): (i) how bowtie2 scores a mismatch, and (ii) how bowtie2 decides if an alignment is valid. For more information on computing MAPQ scores, see my more recent posts on paired-end reads and local mode.


The short and simple is this:
Bowtie2 calculates a minimum score to define a valid alignment. This minimum score is defined by the 3 comma-separated parameters given to the --score-min option. 



The rest is this:
Below I discuss those 3 comma-separated parameters in more detail, and give examples on (i) the defaults in end-to-end and local modes, (ii) defining a function where the minimum score is purely dependent on read length, (iii) defining a function where the minimum score is independent of read length.



--score-min Parameter Description:

- --score-min F,B,M :: F=transform function (C=constant, G=log, L=linear, S=sqrt); B=b in mx+b; M=m in mx+b

        - computed min_score(read_length) = B + M*F(read_length)

        - Minimum score = absolute minimum + length-dependent addition

        - If the "absolute minimum" is set to 0, then it is purely dependent on read length.

        - Otherwise, it requires first meeting a constant minimum like BWA, followed by becoming more stringent with read length.



DEFAULT IN END-TO-END MODE:

    - Default for end-to-end mode = L,-0.6,-0.6

        - L,-0.6,-0.6(read_length) = -0.6 + -0.6*read_length

        - Read_length -> MinScore

            - 20 -> -12.6

            - 30 -> -18.6

            - 50 -> -30.6

            - 75 -> -45.6

            - 100 -> -60.6

            - 150 -> -90.6

            

DEFAULT IN LOCAL MODE:

    - Default for local mode = G,20,8

        - G,20,8(read_length) = 20 + 8*log(read_length)

        - Read_length -> MinScore

            - 20 -> 43.9659

            - 30 -> 47.2096

            - 50 -> 51.2962

            - 75 -> 54.5399

            - 100 -> 56.8414

            - 150 -> 60.0851

            


DEFINING A FUNCTION PURELY DEPENDENT ON READ LENGTH:

Purely dependent on read length:

        - End-to-end mode - similar to default without absolute minimum:

            - L,0,-0.6

        - Local mode - similar to default without absolute minimum:

            - G,0,8

            


DEFINING A FUNCTION INDEPENDENT ON READ LENGTH:

    - Only requires passing absolute minimum / independent of read length.

        - This is similar to the default in another popular short read aligner, BWA, where the alignment score is calculated independent of read length, and the default minimum score is 30 (-T 30).

        - If you wanted Bowtie2 to have a similar approach, the following codes work by analogy (Bowtie2 uses +2 for matches by default, BWA +1):

            - -T 20 ==> C,40,0

                - C,40,0(read_length) = 40 + 0*read_length = 40

            - -T 30 ==> C,60,0

                - C,60,0(read_length) = 60 + 0*read_length = 60

            - ETC

        - If you wanted it even more similar, the following should do the trick in theory, but differences are found anyway:

            - -T 20 ==> --ma 1 --mp 4,4 --rdg 6,1 --rfg 6,1 --np 4 --score-min C,20,0

            - -T 30 ==> "--ma 1 --mp 4,4 --rdg 6,1 --rfg 6,1 --np 4 --score-min C,30,0

            - ETC


THE END:

Well, that is all folks. Best wishes to you. 



No comments:

Post a Comment