A one-dimensional parameter-free model for carcinogenesis in gene expression space

bathroomscleaning
08/08/2022
589 Views

Cancer is a complex multifactorial phenomenon, the understanding of which is still a challenge. The current knowledge of carcinogenesis emphasizes on a sequence of special (driver) mutations leading to a progression to the tumor state. Epigenetic changes, microenvironment effects and other factors are also recognized to play important roles¹. There is also a plausible hypothesis that cancer is a remnant of an ancient multicellular state encoded in our genes².

Existing theories face difficulties and should make additional assumptions. Let us examine, for example, the prototype of multistep theory: Vogelstein’s idea of progression in colon cancer³. In order to implement it in an algorithm, we should introduce as additional parameters the number of intermediate steps and their transition rates.

In the present paper, we advance a model of tumorigenesis in which parameters are either calculated from processed gene expression data or taken from compilations of experimental results. In other words, it is a parameter-free model. The starting point is a gene expression (GE) description⁴, where small portions of a tissue define microstates in GE space. In this picture, the normal (homeostatic) and tumor states are seen as distant regions (attractors)^5,6. On the other hand, the high dimensionality of the GE space, coming from the large number of differentially expressed genes, can be reduced by means of principal component analysis^7,8,9. This procedure has been recently applied in Refs.^10,11to the analysis of gene expression data for 15 types of cancer from The Cancer Genome Atlas portal¹², showing very interesting results. In particular, the first principal component axis measures progression to cancer. Based on the results from Refs.^10,11, especially the case of colon adenocarcinoma (COAD) which is discussed in detail in this paper as a prototype, we aim at building a simplified parameter-free one-variable model for the cancer risk.