Epitope Prediction Using R

The package EpitopePrediction implements the stabilized matrix method (SMM), developed by Kim, Peters and colleagues [@kim09], to predict the binding strength between peptides and major histocompatibility complex (MHC) class I molecules. This is an essential step in vaccine design and is also used for many other purposes in computational immunology.

SMM is a simple and fast method for peptide-MHC binding, but is slightly less accurate than other methods like NetMHCpan. Future versions of this package may implement more binding prediction methods.

Getting started

As usual, load the package first.

library(EpitopePrediction)

To predict which 9-mers from the CORE protein of the hepatitis C virus would bind to the HLA-A-02:01 molecule, we can use:

## This is the CORE protein from the Hepatitis C virus reference sequence available
## at https://hcv.lanl.gov/content/sequence/LOCATE/locate.html
hcv.core <- paste("MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGRR",
    "QPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKVIDTLTCGFADLMGYIP",
    "LVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLSCLTVPASA",sep="")
binders( hcv.core, "HLA-A-02:01", 9 )
     peptide start end       ic50
1  QIVGGVYLL    29  37 342.132318
2  AQPGYPWPL    77  85  72.588630
3  DLMGYIPLV   132 140   6.880586
4  YIPLVGAPL   136 144 419.487766
5  ALAHGVRVL   150 158 398.367041
6  VLEDGVNYA   157 165 218.595089
7  NLPGCSFSI   168 176  72.074278
8  FSIFLLALL   174 182 215.146309
9  FLLALLSCL   177 185   7.590588
10 LLSCLTVPA   181 189  76.435547

The number of results shown can be controlled by using the ic50.threshold or quantile.threshold parameters.

## All peptides with a predicted IC50 of <100
binders( hcv.core, "HLA-A-02:01", 9, ic50.threshold=100 )
    peptide start end      ic50
1 AQPGYPWPL    77  85 72.588630
2 DLMGYIPLV   132 140  6.880586
3 NLPGCSFSI   168 176 72.074278
4 FLLALLSCL   177 185  7.590588
5 LLSCLTVPA   181 189 76.435547
## The top 2% binders from the protein
binders( hcv.core, "HLA-A-02:01", 9, quantile.threshold=0.02 )
    peptide start end      ic50
1 AQPGYPWPL    77  85 72.588630
2 DLMGYIPLV   132 140  6.880586
3 NLPGCSFSI   168 176 72.074278
4 FLLALLSCL   177 185  7.590588

The binders function works by calling a lower-level function smm for all peptides in the given protein. Of course you can also call that function directly to predict binding for specific peptides.

## Predict IC50 binding values for two famous peptides
smm( c("SLYNTVATL","SYFPEITHI"), "HLA-A-02:01" )
[1]   66.98965 4776.75371

The nomenclature around MHC molecules is not rigorously standardized, and various formats are in use to name MHC molecules at 4-digit resolution. The package tries to be as permissive as possible in allowing different notations, for example "HLA-A0201", "HLA-A-0201", "HLA-A*0201", and "HLA-A*02:01 will all work.

Supported MHC molecules and peptide lengths

Currently, the package supports several MHC molecules from four different species – humans, mice, rhesus macaques, and chimpanzees. To show a list of currently supported MHC molecules and peptide lengths, use the following R command:

mhcs <- supportedMHCs()
[1] "EpitopePrediction"
levels( mhcs$mhc )
 [1] "H-2-Db"       "H-2-Dd"       "H-2-Kb"       "H-2-Kd"      
 [5] "H-2-Kk"       "H-2-Ld"       "HLA-A-01:01"  "HLA-A-02:01" 
 [9] "HLA-A-02:02"  "HLA-A-02:03"  "HLA-A-02:06"  "HLA-A-02:11" 
[13] "HLA-A-02:12"  "HLA-A-02:16"  "HLA-A-02:19"  "HLA-A-02:50" 
[17] "HLA-A-03:01"  "HLA-A-11:01"  "HLA-A-23:01"  "HLA-A-24:02" 
[21] "HLA-A-24:03"  "HLA-A-25:01"  "HLA-A-26:01"  "HLA-A-26:03" 
[25] "HLA-A-29:02"  "HLA-A-30:01"  "HLA-A-30:02"  "HLA-A-31:01" 
[29] "HLA-A-32:01"  "HLA-A-33:01"  "HLA-A-68:01"  "HLA-A-68:02" 
[33] "HLA-A-69:01"  "HLA-A-80:01"  "HLA-B-07:02"  "HLA-B-08:01" 
[37] "HLA-B-08:02"  "HLA-B-15:01"  "HLA-B-15:02"  "HLA-B-15:03" 
[41] "HLA-B-15:17"  "HLA-B-18:01"  "HLA-B-27:05"  "HLA-B-35:01" 
[45] "HLA-B-39:01"  "HLA-B-40:02"  "HLA-B-44:02"  "HLA-B-44:03" 
[49] "HLA-B-45:01"  "HLA-B-46:01"  "HLA-B-48:01"  "HLA-B-51:01" 
[53] "HLA-B-53:01"  "HLA-B-54:01"  "HLA-B-58:01"  "HLA-B-73:01" 
[57] "Mamu-A-01"    "Mamu-A-02"    "Mamu-A-11"    "Mamu-A-22:01"
[61] "Mamu-A-26:01" "Mamu-B-01"    "Mamu-B-03"    "Mamu-B-08"   
[65] "Mamu-B-17"    "Mamu-B-48"    "Mamu-B-52"    "Patr-A-01:01"
[69] "Patr-A-03:01" "Patr-A-04:01" "Patr-A-07:01" "Patr-A-09:01"
[73] "Patr-B-01:01" "Patr-B-13:01" "Patr-B-24:01"

For all of these MHC molecules, binding strengths can be predicted at least for peptides of length 9. For some of the more frequent molecules, also other peptide lengths are supported:

mhc.table <- table( mhcs$mhc )
mhc.table[mhc.table > 1]

      H-2-Db       H-2-Kb       H-2-Kk       H-2-Ld  HLA-A-01:01 
           4            4            4            2            2 
 HLA-A-02:01  HLA-A-02:02  HLA-A-02:03  HLA-A-02:06  HLA-A-03:01 
           4            4            3            4            2 
 HLA-A-11:01  HLA-A-24:02  HLA-A-26:01  HLA-A-30:01  HLA-A-30:02 
           2            2            2            2            2 
 HLA-A-31:01  HLA-A-32:01  HLA-A-68:01  HLA-A-68:02  HLA-B-07:02 
           2            2            2            3            3 
 HLA-B-08:01  HLA-B-18:01  HLA-B-27:05  HLA-B-35:01  HLA-B-44:02 
           2            2            4            2            2 
 HLA-B-51:01  HLA-B-53:01  HLA-B-54:01  HLA-B-58:01    Mamu-A-01 
           2            2            2            2            4 
   Mamu-A-02    Mamu-A-11 Mamu-A-22:01 Mamu-A-26:01    Mamu-B-01 
           4            3            2            2            4 
   Mamu-B-03    Mamu-B-08    Mamu-B-17 Patr-A-01:01 Patr-A-03:01 
           4            4            3            2            2 
Patr-A-04:01 Patr-A-07:01 Patr-A-09:01 Patr-B-01:01 Patr-B-13:01 
           2            2            3            3            2 
Patr-B-24:01 
           2 

For example, these are the peptide lengths currently supported for HLA-A-02:01.

mhcs$l[mhcs$mhc=='HLA-A-02:01']
[1] 10 11  8  9

This reflects a general picture: Most often, the anchor positions are the C-terminus and the second position or a position next to it.

hist( sapply( supportedMHCs(9)$mhc, anchorPositions ), breaks=seq(0.5,9.5,1), 
    main="Anchor positions for 9-mers", xlab="Position" )
[1] "EpitopePrediction"

plot of chunk unnamed-chunk-10

Accessing the SMM matrices

A great advantage of the SMM method above other method is that the prediction is easy to understand – it is based on simple scoring matrices, one matrix per combination of MHC and peptide length. These matrices can be directly accessed.

## load prediction matrix for HLA-A-02:01 9-mers
M <- smmMatrix( "HLA-A-02:01", 9 )$M

We can interrogate the matrices to learn about the binding preferences of MHC molecules. Keep in mind that a low IC50 value means good binding, hence negative numbers indicate binding preferences and positive numbers indicate incompatible amino acids.

## How well is Valine liked by HLA-A02:01 at each position?
M["V",]
[1] -0.1147540 -0.6744100  0.0404313  0.1642890 -0.0873646 -0.3028550
[7] -0.0725807  0.1574830 -1.6088400
## Which amino acid is most preferred at the C-terminal position?
names( which.min( M[,9] ) )
[1] "V"

The package provides an auxiliary function to estimate the structural anchor positions of the MHC molecules, which are the positions that “grab” the peptides, from the variance in matrix columns.

## Which are the anchor positions for HLA-B-27:05?
anchorPositions( "HLA-B-27:05" )
[1] 2 9

Generating binding motifs

A common way to visualize the binding preferences of an MHC molecule is a so-called binding motif [@lund05]. The package provides a function to generate such binding motifs from the SMM matrices. In a binding motif, each column has the height of the information content at that position, which is log(20)-H where H is the entropy at that position. Each column is subdivided into blocks that stand for the amino acids, and each amino acid has a height proportional to its contribution to a low IC50 value at that position.

## Compare binding motifs of HLA-A02 at all supported peptide lengths
par( mfrow=c(2,2) )
for( l in 8:11 ){
   plotBindingMotif( "HLA-A-02:01", l )
}

plot of chunk unnamed-chunk-16

References


references: