A one-dimensional parameter-free model for carcinogenesis in gene expression space

Cancer is a complex multifactorial phenomenon, the understanding of which is still a challenge. The current knowledge of carcinogenesis emphasizes on a sequence of special (driver) mutations leading to a progression to the tumor state. Epigenetic changes, microenvironment effects and other factors are also recognized to play important roles1. There is also a plausible hypothesis that cancer is a remnant of an ancient multicellular state encoded in our genes2.

A one-dimensional parameter-free model for carcinogenesis in gene expression space

Existing theories face difficulties and should make additional assumptions. Let us examine, for example, the prototype of multistep theory: Vogelstein’s idea of progression in colon cancer3. In order to implement it in an algorithm, we should introduce as additional parameters the number of intermediate steps and their transition rates.

In the present paper, we advance a model of tumorigenesis in which parameters are either calculated from processed gene expression data or taken from compilations of experimental results. In other words, it is a parameter-free model. The starting point is a gene expression (GE) description4, where small portions of a tissue define microstates in GE space. In this picture, the normal (homeostatic) and tumor states are seen as distant regions (attractors)5,6. On the other hand, the high dimensionality of the GE space, coming from the large number of differentially expressed genes, can be reduced by means of principal component analysis7,8,9. This procedure has been recently applied in Refs.10,11to the analysis of gene expression data for 15 types of cancer from The Cancer Genome Atlas portal12, showing very interesting results. In particular, the first principal component axis measures progression to cancer. Based on the results from Refs.10,11, especially the case of colon adenocarcinoma (COAD) which is discussed in detail in this paper as a prototype, we aim at building a simplified parameter-free one-variable model for the cancer risk.