Novel high-throughput T-cell receptor (TCR) sequencing techniques allow unparalleled potential for querying the state of an individual's immune system. However while it is easy to generate large quantities of TCR data, the analysis of these big data sets are far from trivial. In particular, due to the immense variation and degeneracy that exists within TCR sequences, discovering a T-cell's function or at least knowing which epitope or specific disease that a single TCR sequence will target, is almost impossible without the aid of complex molecular models. Indeed, there are over 10^30 possible T-cell receptor-epitope combinations, which is more than the estimated number of stars in the universe.
At the AUDACIS consortium, we have developed several computational techniques that can query this type of large TCR sequencing for the purpose of understanding the long-term protection that a person's immune system has established against different diseases.
Key to this development is the establishment of a computational machine learning technique that can be trained on known existing pairs of T-cells and their epitopes, to then find those T-cells hidden within the large repertoires of individuals that target the same epitope. In our first proof of concept, we have demonstrated that we are able to predict T-cells that target specific HIV epitopes with a mean precision between 80% and 90% while they are hidden amongst thousands of irrelevant TCR sequences.
Further, we have used the insight gained from the development of this TCR recognition model to screen a set of individuals known to be either cytomegalovirus (CMV) seropositive or seronegative. Simply based on their characterized T-cell repertoires, we were able to predict 87% of individuals correctly as being CMV seropositive. This indicates the massive potential of these techniques combined with the right data analysis models for medial diagnostics.
Currently we are developing these computational techniques into a combined toolbox, which we have called TCRex. This will be a web application that allows querying of large TCR datasets for the presence of T-cells targeting specific epitopes of relevance to a wide variety of diseases.
Our research thus presents a significant step forward in the computational analysis of peptide immunogenicity, which is crucial to rational vaccine design as well as understanding complex immunological research areas such as auto-immunity and tumor susceptibility.