The package EpitopePrediction
implements the stabilized matrix method (SMM),
developed by Kim, Peters and
colleagues [@kim09], to predict the binding strength between peptides and major
histocompatibility
complex (MHC) class I molecules. This is an essential step in vaccine design and is also
used for many other purposes in computational immunology.
SMM is a simple and fast method for peptide-MHC binding, but is slightly less accurate than other methods like NetMHCpan. Future versions of this package may implement more binding prediction methods.
As usual, load the package first.
library(EpitopePrediction)
To predict which 9-mers from the CORE protein of the hepatitis C virus would bind to the HLA-A-02:01 molecule, we can use:
## This is the CORE protein from the Hepatitis C virus reference sequence available
## at https://hcv.lanl.gov/content/sequence/LOCATE/locate.html
hcv.core <- paste("MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGRR",
"QPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKVIDTLTCGFADLMGYIP",
"LVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLSCLTVPASA",sep="")
binders( hcv.core, "HLA-A-02:01", 9 )
peptide start end ic50
1 QIVGGVYLL 29 37 342.132318
2 AQPGYPWPL 77 85 72.588630
3 DLMGYIPLV 132 140 6.880586
4 YIPLVGAPL 136 144 419.487766
5 ALAHGVRVL 150 158 398.367041
6 VLEDGVNYA 157 165 218.595089
7 NLPGCSFSI 168 176 72.074278
8 FSIFLLALL 174 182 215.146309
9 FLLALLSCL 177 185 7.590588
10 LLSCLTVPA 181 189 76.435547
The number of results shown can be controlled by using the
ic50.threshold
or quantile.threshold
parameters.
## All peptides with a predicted IC50 of <100
binders( hcv.core, "HLA-A-02:01", 9, ic50.threshold=100 )
peptide start end ic50
1 AQPGYPWPL 77 85 72.588630
2 DLMGYIPLV 132 140 6.880586
3 NLPGCSFSI 168 176 72.074278
4 FLLALLSCL 177 185 7.590588
5 LLSCLTVPA 181 189 76.435547
## The top 2% binders from the protein
binders( hcv.core, "HLA-A-02:01", 9, quantile.threshold=0.02 )
peptide start end ic50
1 AQPGYPWPL 77 85 72.588630
2 DLMGYIPLV 132 140 6.880586
3 NLPGCSFSI 168 176 72.074278
4 FLLALLSCL 177 185 7.590588
The binders
function works by calling a lower-level function smm
for all peptides
in the given protein. Of course you can also call that function directly to predict
binding for specific peptides.
## Predict IC50 binding values for two famous peptides
smm( c("SLYNTVATL","SYFPEITHI"), "HLA-A-02:01" )
[1] 66.98965 4776.75371
The nomenclature around MHC molecules is not rigorously standardized, and various formats
are in use to name MHC molecules at 4-digit resolution. The package tries to be as
permissive as possible in allowing different notations, for example
"HLA-A0201"
, "HLA-A-0201"
, "HLA-A*0201"
, and "HLA-A*02:01
will all work.
Currently, the package supports several MHC molecules from four different species – humans, mice, rhesus macaques, and chimpanzees. To show a list of currently supported MHC molecules and peptide lengths, use the following R command:
mhcs <- supportedMHCs()
[1] "EpitopePrediction"
levels( mhcs$mhc )
[1] "H-2-Db" "H-2-Dd" "H-2-Kb" "H-2-Kd"
[5] "H-2-Kk" "H-2-Ld" "HLA-A-01:01" "HLA-A-02:01"
[9] "HLA-A-02:02" "HLA-A-02:03" "HLA-A-02:06" "HLA-A-02:11"
[13] "HLA-A-02:12" "HLA-A-02:16" "HLA-A-02:19" "HLA-A-02:50"
[17] "HLA-A-03:01" "HLA-A-11:01" "HLA-A-23:01" "HLA-A-24:02"
[21] "HLA-A-24:03" "HLA-A-25:01" "HLA-A-26:01" "HLA-A-26:03"
[25] "HLA-A-29:02" "HLA-A-30:01" "HLA-A-30:02" "HLA-A-31:01"
[29] "HLA-A-32:01" "HLA-A-33:01" "HLA-A-68:01" "HLA-A-68:02"
[33] "HLA-A-69:01" "HLA-A-80:01" "HLA-B-07:02" "HLA-B-08:01"
[37] "HLA-B-08:02" "HLA-B-15:01" "HLA-B-15:02" "HLA-B-15:03"
[41] "HLA-B-15:17" "HLA-B-18:01" "HLA-B-27:05" "HLA-B-35:01"
[45] "HLA-B-39:01" "HLA-B-40:02" "HLA-B-44:02" "HLA-B-44:03"
[49] "HLA-B-45:01" "HLA-B-46:01" "HLA-B-48:01" "HLA-B-51:01"
[53] "HLA-B-53:01" "HLA-B-54:01" "HLA-B-58:01" "HLA-B-73:01"
[57] "Mamu-A-01" "Mamu-A-02" "Mamu-A-11" "Mamu-A-22:01"
[61] "Mamu-A-26:01" "Mamu-B-01" "Mamu-B-03" "Mamu-B-08"
[65] "Mamu-B-17" "Mamu-B-48" "Mamu-B-52" "Patr-A-01:01"
[69] "Patr-A-03:01" "Patr-A-04:01" "Patr-A-07:01" "Patr-A-09:01"
[73] "Patr-B-01:01" "Patr-B-13:01" "Patr-B-24:01"
For all of these MHC molecules, binding strengths can be predicted at least for peptides of length 9. For some of the more frequent molecules, also other peptide lengths are supported:
mhc.table <- table( mhcs$mhc )
mhc.table[mhc.table > 1]
H-2-Db H-2-Kb H-2-Kk H-2-Ld HLA-A-01:01
4 4 4 2 2
HLA-A-02:01 HLA-A-02:02 HLA-A-02:03 HLA-A-02:06 HLA-A-03:01
4 4 3 4 2
HLA-A-11:01 HLA-A-24:02 HLA-A-26:01 HLA-A-30:01 HLA-A-30:02
2 2 2 2 2
HLA-A-31:01 HLA-A-32:01 HLA-A-68:01 HLA-A-68:02 HLA-B-07:02
2 2 2 3 3
HLA-B-08:01 HLA-B-18:01 HLA-B-27:05 HLA-B-35:01 HLA-B-44:02
2 2 4 2 2
HLA-B-51:01 HLA-B-53:01 HLA-B-54:01 HLA-B-58:01 Mamu-A-01
2 2 2 2 4
Mamu-A-02 Mamu-A-11 Mamu-A-22:01 Mamu-A-26:01 Mamu-B-01
4 3 2 2 4
Mamu-B-03 Mamu-B-08 Mamu-B-17 Patr-A-01:01 Patr-A-03:01
4 4 3 2 2
Patr-A-04:01 Patr-A-07:01 Patr-A-09:01 Patr-B-01:01 Patr-B-13:01
2 2 3 3 2
Patr-B-24:01
2
For example, these are the peptide lengths currently supported for HLA-A-02:01.
mhcs$l[mhcs$mhc=='HLA-A-02:01']
[1] 10 11 8 9
This reflects a general picture: Most often, the anchor positions are the C-terminus and the second position or a position next to it.
hist( sapply( supportedMHCs(9)$mhc, anchorPositions ), breaks=seq(0.5,9.5,1),
main="Anchor positions for 9-mers", xlab="Position" )
[1] "EpitopePrediction"
A great advantage of the SMM method above other method is that the prediction is easy to understand – it is based on simple scoring matrices, one matrix per combination of MHC and peptide length. These matrices can be directly accessed.
## load prediction matrix for HLA-A-02:01 9-mers
M <- smmMatrix( "HLA-A-02:01", 9 )$M
We can interrogate the matrices to learn about the binding preferences of MHC molecules. Keep in mind that a low IC50 value means good binding, hence negative numbers indicate binding preferences and positive numbers indicate incompatible amino acids.
## How well is Valine liked by HLA-A02:01 at each position?
M["V",]
[1] -0.1147540 -0.6744100 0.0404313 0.1642890 -0.0873646 -0.3028550
[7] -0.0725807 0.1574830 -1.6088400
## Which amino acid is most preferred at the C-terminal position?
names( which.min( M[,9] ) )
[1] "V"
The package provides an auxiliary function to estimate the structural anchor positions of the MHC molecules, which are the positions that “grab” the peptides, from the variance in matrix columns.
## Which are the anchor positions for HLA-B-27:05?
anchorPositions( "HLA-B-27:05" )
[1] 2 9
A common way to visualize the binding preferences of an MHC molecule is a so-called binding motif [@lund05]. The package provides a function to generate such binding motifs from the SMM matrices. In a binding motif, each column has the height of the information content at that position, which is log(20)-H where H is the entropy at that position. Each column is subdivided into blocks that stand for the amino acids, and each amino acid has a height proportional to its contribution to a low IC50 value at that position.
## Compare binding motifs of HLA-A02 at all supported peptide lengths
par( mfrow=c(2,2) )
for( l in 8:11 ){
plotBindingMotif( "HLA-A-02:01", l )
}
references:
id: lund05 type: book author:
id: kim09 type: article-journal author: