Article 6 (written in 2018)

1. Epistasis is the phenomenon where the effect of one gene (locus) is dependent on the presence of one or more 'modifier genes', i.e. the genetic background.

2. Epistatic mutations have different effects in combination than individually.

3. Epistasis arises due to interactions, either between genes, or within them, leading to non-linear effects.

4. The expression of any one allele depends in a complicated way on many other alleles.

5. Most genes exhibit at least some level of epistatic interaction.

6. When a double mutation has a fitter phenotype than expected from the effects of the two single mutations, it is referred to as positive epistasis. Positive epistasis between beneficial mutations generates greater improvements in function than expected. Positive epistasis between deleterious mutations protects against the negative effects to cause a less severe fitness drop.

7. Conversely, when two mutations together lead to a less fit phenotype than expected from their effects when alone, it is called negative epistasis.

8. When the effect on fitness of two mutations is more radical than expected from their effects when alone, it is referred to as synergistic epistasis. The opposite situation, when the fitness difference of the double mutant from the wild type is smaller than expected from the effects of the two single mutations, it is called antagonistic epistasis. Therefore, for deleterious mutations, negative epistasis is also synergistic, while positive epistasis is antagonistic; conversely, for advantageous mutations, positive epistasis is synergistic, while negative epistasis is antagonistic. [This is what happens with disease phenotypes.]

9. Sign epistasis occurs when one mutation has the opposite effect when in the presence of another mutation. This occurs when a mutation that is deleterious on its own can enhance the effect of a particular beneficial mutation. For example, a large and complex brain is a waste of energy without a range of sense organs, but sense organs are made more useful by a large and complex brain that can better process the information.

At its most extreme, reciprocal sign epistasis occurs when two deleterious genes are beneficial when together. For example, producing a toxin alone can kill a bacterium, and producing a toxin exporter alone can waste energy, but producing both can improve fitness by killing competing organisms.

10. When two mutations are viable alone but lethal in combination, it is called synthetic lethality or unlinked non-complementation.

11. Most genes interact with hundreds of thousands of other genes.

12. Epistasis within the genomes of organisms occurs due to interactions between the genes within the genome. This interaction may be direct if the genes encode proteins that, for example, are separate components of a multi-component protein (such as the ribosome), inhibit each other's activity, or if the protein encoded by one gene modifies the other (such as by phosphorylation). Alternatively the interaction may be indirect, where the genes encode components of a metabolic pathway or network, developmental pathway, signaling pathway or transcription factor network. For example, the gene encoding the enzyme that synthesizes penicillin is of no use to a fungus without the enzymes that synthesize the necessary precursors in the metabolic pathway.

13. Diploid organisms contain two copies of each gene. If these are different (heterozygous/heteroallelic), the two different copies of the allele may interact with each other to cause epistasis. This is sometimes called allelic complementation, or interallelic complementation.

14. In evolutionary genetics, the sign of epistasis is usually more significant than the magnitude of epistasis. This is because magnitude epistasis (positive and negative) simply affects how beneficial mutations are together, however sign epistasis affects whether mutation combinations are beneficial or deleterious.

Wikipedia “Epistasis”

15. Epigenetics most often denotes changes in a chromosome that affect gene activity and expression.

The term also refers to the changes themselves: functionally relevant changes to the genome that do not involve a change in the nucleotide sequence. Examples of mechanisms that produce such changes are DNA methylation and histone modification, each of which alters how genes are expressed without altering the underlying DNA sequence. Gene expression can be controlled through the action of repressor proteins that attach to silencer regions of the DNA. These epigenetic changes may last through cell divisions for the duration of the cell's life, and may also last for multiple generations even though they do not involve changes in the underlying DNA sequence of the organism; instead, non-genetic factors cause the organism's genes to behave (or "express themselves") differently. [The non-genetic factors that cause genes to behave differently is the Other Architect.]

16. Historically, some phenomena not necessarily heritable have also been described as epigenetic. For example, the term epigenetic has been used to describe any modification of chromosomal regions, especially histone modifications, whether or not these changes are heritable or associated with a phenotype.

17. The term epigenetics in its contemporary usage emerged in the 1990s, but for some years has been used in somewhat variable meanings. A consensus definition of the concept of epigenetic trait as "stably heritable phenotype resulting from changes in a chromosome without alterations in the DNA sequence" was formulated at a Cold Spring Harbor meeting in 2008, although alternate definitions that include non-heritable traits are still being used.

18. The NIH "Roadmap Epigenomics Project", ongoing as of 2016, uses the following definition: "For purposes of this program, epigenetics refers to both heritable changes in gene activity and expression (in the progeny of cells or of individuals) and also stable, long-term alterations in the transcriptional potential of a cell that are not necessarily heritable."

19. Epigenetic changes modify the activation of certain genes, but not the genetic code sequence of DNA. The microstructure (not code) of DNA itself or the associated chromatin proteins may be modified, causing activation or silencing. This mechanism enables differentiated cells in a multicellular organism to express only the genes that are necessary for their own activity. Epigenetic changes are preserved when cells divide.

20. Most epigenetic changes only occur within the course of one individual organism's lifetime; however, these epigenetic changes can be transmitted to the organism's offspring through a process called transgenerational epigenetic inheritance. Moreover, if gene inactivation occurs in a sperm or egg cell that results in fertilization, this epigenetic modification may also be transferred to the next generation. [This is when parents and their children have the same epistasis.]

21. Covalent modifications of either DNA (e.g. cytosine methylation and hydroxymethylation) or of histone proteins (e.g. lysine acetylation, lysine and arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation) play central roles in many types of epigenetic inheritance. Therefore, the word "epigenetics" is sometimes used as a synonym for these processes. However, this can be misleading. Chromatin remodeling is not always inherited, and not all epigenetic inheritance involves chromatin remodeling. [Parent and offspring have the same epistasis.]

22. One way that genes are regulated is through the remodeling of chromatin. Chromatin is the complex of DNA and the histone proteins with which it associates. If the way that DNA is wrapped around the histones changes, gene expression can change as well.

23. Mechanisms of heritability of histone state are not well understood; however, much is known about the mechanism of heritability of DNA methylation state during cell division and differentiation. [It's not understood because they assume it's heredity, but it's just that the parents and offspring have the same epistasis.]

24. Although histone modifications occur throughout the entire sequence, the unstructured N-termini of histones (called histone tails) are particularly highly modified. These modifications include acetylation, methylation, ubiquitylation, phosporylation, sumoylation, ribosylation and citrullination. Acetylation is the most highly studied of these modifications.

25. Differing histone modifications are likely to function in differing ways; acetylation at one position is likely to function differently from acetylation at another position. Also, multiple modifications may occur at the same time, and these modifications may work together to change the behavior of the nucleosome.

26. It has been suggested that chromatin-based transcriptional regulation could be mediated by the effect of small RNAs. Small Interfering RNAs can modulate transcriptional gene expression via epigenetic modulation of targeted promoters.

27. MicroRNAs (miRNAs) are members of non-coding RNAs that range in size from 17 to 25 nucleotides.

28. Each miRNA expressed in a cell may target about 100 to 200 messenger RNAs that it downregulates. Most of the downregulation of mRNAs occurs by causing the decay of the targeted mRNA, while some downregulation occurs at the level of translation into protein.

29. It appears that about 60% of human protein coding genes are regulated by miRNAs. Many miRNAs are epigenetically regulated. About 50% of miRNA genes are associated with CPG islands, that may be repressed by epigenetic methylation. Transcription from methylated CpG islands is strongly and heritably repressed. Other miRNAs are epigenetically regulated by either histone modifications or by combined DNA methylation and histone modification.

30. Eukaryotic genomes have numerous nucleosomes. Nucleosome position is not random, and determine the accessibility of DNA to regulatory proteins. This determines differences in gene expression and cell differentiation. It has been shown that at least some nucleosomes are retained in sperm cells (where most but not all histones are replaced by protamines). Thus nucleosome positioning is to some degree inheritable. Recent studies have uncovered connections between nucleosome positioning and other epigenetic factors, such as DNA methylation and hydroxymethylation. [Notice it says “to some degree inheritable”.]

31. The genome sequence is static (with some notable exceptions), but cells differentiate into many different types, which perform different functions, and respond differently to the environment and intercellular signaling. Thus, as individuals develop, morphogens activate or silence genes in an epigenetically heritable fashion, giving cells a memory.

32. Some investigators think epigenetics may ultimately turn out to have a greater role in disease than genetics.

Wikipedia “Epigenetics”

33. Non heritable genetic disorders are those disorders which do not pass from one generation to the next. Some disorders like the X-linked SCID, Hemophilia are examples of heritable disorders. The disorders like cancer, some neurological disorders are both heritable as well as acquired.

34. Alzheimer's disease is caused by environmental as well as genetic factors. The cause of epilepsy is unknown.

35. The heritability of most common, multifactorial diseases is rather modest and known genetic effects account for a small part of it. The remaining portion of disease aetiology has been conventionally ascribed to environmental effects, with an unknown part being stochastic.

36. Sporadic disorders are common in medicine. We wish to stress the non-heritable genetic variation as a potentially important factor behind the development of sporadic diseases.

37. Over the past three decades, projects in human genetics searching for genotype–phenotype correlations have mostly focused on analyses of the inherited genome. These include studies of genes causing monogenic disorders and more recent analyses of the association of complex diseases with single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWAS).

38. In recent years, the GWAS have dominated the human medical genetic landscape of complex diseases and have, notwithstanding their shortcomings, contributed to our knowledge of human genetics. They have improved our understanding of the genetic basis of many human traits, as >1200 variants associated with >165 different human traits and diseases have been described. However, to the chagrin of the field, the portion of the estimated heritability explained by the GWAS findings has been unexpectedly low. Many explanations have been proposed for the ‘missing heritability’ of complex traits, including human disease. Faced with the inefficiency with which inherited biology explains and predicts disease, we argue that the weight should shift to the non-inherited component which, until now, has routinely been thought of as synonymous with environmental factors.

39. Post-zygotic DNA sequence mutations, although known to occur in normal cells, were not considered to be a major factor behind common diseases, but recent evidence seriously challenges this belief. Our focus is to highlight the importance of somatic mosaicism as a potentially crucial factor causing complex human diseases. The phenomenon that is discussed here has many names—for example, somatic mosaicism, somatic variation, post-zygotic changes, de novo variants, aberrations acquired during lifetime, and detectable clonal mosaicism. All these terms fall into a definition of mosaicism as the presence of genetically distinct lineages of cells in a single organism that is derived from the same zygote. We use here ‘post-zygotic variation’ or ‘post-zygotic mosaicism’ as unifying terms for all DNA changes acquired during life, from single base pair mutations to aberrations at the chromosomal level.

40. Normal cells accumulate structural aberrations with age, which are readily identified using genome scanning on SNP arrays. These structural changes fall into three major categories: deletions, gains, and copy number neutral loss of heterozygosity (CNNLOH, also called acquired uniparental disomy, aUPD). The size of these aberrations is highly variable, from a few kb to entire chromosomes. The relationship between age and mosaicism is strong and other tested co-variants, such as sex, ancestry, and smoking, did not have a significant effect on the mosaic status. [These co-variants have no significant effect because specific types of mosaicism result from genetic predispositions.]

41. Sporadic disorders, defined as a lack of similar cases among the closest relatives of an affected patient, are common in medicine.

42. The non-heritable causes of human disease have traditionally been ascribed to environmental factors. With few exceptions, however, such as smoking for lung cancer or alcohol for liver cirrhosis, specific identification of most of these factors has proven elusive for common multifactorial diseases and methodological breakthroughs likely to change this are nowhere in sight.

43. Post-zygotic mutations are clearly not heritable, and cannot therefore explain the ‘missing heritability’. However, they might be a part of the non-heritable disease causality, which has, until now, been underestimated in importance and routinely ascribed to the environment.

“Non-heritable genetics of human disease: spotlight on post-zygotic genetic variation acquired during lifetime”

44. The DNA of a chromosome is associated with structural proteins that organize, compact, and control access to the DNA, forming a material called chromatin; in eukaryotes, chromatin is usually composed of nucleosomes, segments of DNA wound around cores of histone proteins.

45. Although genes contain all the information an organism uses to function, the environment plays an important role in determining the ultimate phenotypes an organism displays. The phrase “nature and nurture” refers to this complementary relationship. The phenotype of an organism depends on the interaction of genes and the environment.

46. The genome of a given organism contains thousands of genes, but not all these genes need to be active at any given moment. A gene is expressed when it is being transcribed into mRNA and there exist many cellular methods of controlling the expression of genes such that proteins are produced only when needed by the cell. Transcription factors are regulatory proteins that bind to DNA, either promoting or inhibiting the transcription of a gene.

47. Within eukaryotes, there exist structural features of chromatin that influence the transcription of genes, often in the form of modifications to DNA and chromatin that are stably inherited by daughter cells. These features are called “epigenetic” because they exist "on top" of the DNA sequence and retain inheritance from one cell generation to the next. Because of epigenetic features, different cell types grown within the same medium can retain very different properties. Although epigenetic features are generally dynamic over the course of development, some, like the phenomenon of paramutation, have multigenerational inheritance and exist as rare exceptions to the general rule of DNA as the basis for inheritance.

48. During the process of DNA replication, errors occasionally occur in the polymerization of the second strand. These errors, called mutations, can affect the phenotype of an organism, especially if they occur within the protein coding sequence of a gene. Error rates are usually very low—1 error in every 10–100 million bases—due to the "proofreading" ability of DNA polymerases. Processes that increase the rate of changes in DNA are called mutagenic: mutagenic chemicals promote errors in DNA replication, often by interfering with the structure of base-pairing, while UV radiation induces mutations by causing damage to the DNA structure. Chemical damage to DNA occurs naturally as well and cells use DNA repair mechanisms to repair mismatches and breaks. The repair does not, however, always restore the original sequence.

49. Mutations alter an organism's genotype and occasionally this causes different phenotypes to appear. Most mutations have little effect on an organism's phenotype, health, or reproductive fitness. Mutations that do have an effect are usually detrimental, but occasionally some can be beneficial.

Wikipedia “Genetics”

50. Chromatin is not an inert structure, but rather an instructive DNA scaffold that can respond to external cues to regulate the many uses of DNA. A principle component of chromatin that plays a key role in this regulation is the modification of histones. There is an ever-growing list of these modifications and the complexity of their action is only just beginning to be understood. However, it is clear that histone modifications play fundamental roles in most biological processes that are involved in the manipulation and expression of DNA.

51. Histone modifications exert their effects via two main mechanisms. The first involves the modification(s) directly influencing the overall structure of chromatin, either over short or long distances. The second involves the modification regulating (either positively or negatively) the binding of effector molecules. Our review has a transcriptional focus, simply reflecting the fact that most studies involving histone modifications have also had this focus. However, histone modifications are just as relevant in the regulation of other DNA processes such as repair, replication and recombination. Indeed, the principles described below are pertinent to any biological process involving DNA transactions.

52. The large number of possible histone modifications provides scope for the tight control of chromatin structure. Nevertheless, an extra level of complexity exists due to cross-talk between different modifications, which presumably helps to fine-tune the overall control. This cross-talk can occur via multiple mechanisms. (I) There may be competitive antagonism between modifications if more than one modification pathway is targeting the same site(s). This is particularly true for lysines that can be acetylated, methylated or ubiquitylated. (II) One modification may be dependent upon another. (III) The binding of a protein to a particular modification can be disrupted by an adjacent modification. (IV) An enzyme's activity may be affected due to modification of its substrate. (V) There may be cooperation between modifications in order to efficiently recruit specific factors.

53. There may also be cooperation between histone modifications and DNA methylation. For instance, the UHRF1 protein binds to nucleosomes bearing H3K9me3, but this binding is significantly enhanced when the nucleosomal DNA is CpG methylated. Conversely, DNA methylation can inhibit protein binding to specific histone modifications. A good example here is KDM2A, which only binds to nucleosomes bearing H3K9me3 when the DNA is not methylated.

54. From a chromatin point of view, eukaryotic genomes can generally be divided into two geographically distinct environments. The first is a relatively relaxed environment, containing most of the active genes and undergoing cyclical changes during the cell cycle. These 'open' regions are referred to as euchromatin. In contrast, other genomic regions, such as centromeres and telomeres, are relatively compact structures containing mostly inactive genes and are refractive to cell-cycle cyclical changes. These more 'compact' regions are referred to as heterochromatin. This is clearly a simplistic view, as recent work in D. melanogaster has shown that there are five genomic domains of chromatin structure based on analysing the pattern of binding of many chromatin proteins. However, given that most is known about the two simple domains described above, references below will be defined to these two types of genomic domains.

55. Both heterochromatin and euchromatin are enriched, and indeed also depleted, of certain characteristic histone modifications. However, there appears to be no simple rules governing the localization of such modifications, and there is a high degree of overlap between different chromatin regions. Nevertheless, there are regions of demarcation between heterochromatin and euchromatin. These 'boundary elements' are bound by specific factors such as CTCF that play a role in maintaining the boundary between distinct chromatin 'types'. Without such factors, heterochromatin would encroach into and silence the euchromatic regions of the genome. Boundary elements are enriched for certain modifications such as H3K9me1 and are devoid of others such as histone acetylation. Furthermore, a specific histone variant, H2A.Z, is highly enriched at these sites. How all of these factors work together in order to maintain these boundaries is far from clear, but their importance is undeniable. [What orchestrates all of this?]

56. We are beginning to understand how some enzymes are recruited to specific locations, but our knowledge is far from complete. In addition, another question that needs to be considered relates to how different histone modifications integrate in order to regulate DNA processes such as transcription.

57. Changes in histone modifications have been linked to genome instability, chromosome segregation defects and cancer.

58. We have identified many histone modifications, but their functions are just beginning to be uncovered. Certainly, there will be more modifications to discover and we will need to identify the many biological functions they regulate. Perhaps most importantly, there are three areas of sketchy knowledge that need to be embellished in the future.

The first is the delivery and control of histone modifications by RNA. There is an emerging model that short and long RNAs can regulate the precise positioning of modifications and they can do so by interacting with the enzyme complexes that lay down these marks. Given the huge proportion of the genome that is converted into uncharacterised RNAs, there is little doubt that this form of regulation is far more prevalent than is currently considered.

The second emerging area of interest follows the finding that kinases receiving signals from external cues in the cytoplasm can transverse into the nucleus and modify histones. This direct communication between the extracellular environment and the regulation of gene function may well be more widespread. It could involve many of the kinases that are currently thought to regulate gene expression indirectly via signaling cascades.

The third and perhaps the most ill-defined process that will be of interest is that of epigenetic inheritance and the influence of the environment on this process. We know of many biological phenomena that are inherited from mother to daughter cell, but the precise mechanism of how this happens is unclear. Do histone modifications play an important role in this? The answer is yes, and as far as we know they are responsible for perpetuating these events. However, how does the epigenetic signal start off? Is the deposition of the modifications at the right place during replication enough to explain the process? Or is there a 'memory molecule', such an RNA, transmitted from mother to daughter cell, which can deliver histone modifications to the right place? These are fundamental questions at the heart of 'true' epigenetic research, and they will take us a while longer to answer. [The answer to the question “how does the epigenetic signal start off?” will be answered in the seventh and eighth articles.]

“Regulation of chromatin by histone modifications”

59. The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code. Histones associate with DNA to form nucleosomes, which themselves bundle to form chromatin fibers, which in turn make up the more familiar chromosome. Histones are globular proteins with a flexible N-terminus (taken to be the tail) that protrudes from the nucleosome. Many of the histone tail modifications correlate very well to chromatin structure and both histone modification state and chromatin structure correlate well to gene expression levels. The critical concept of the histone code hypothesis is that the histone modifications serve to recruit other proteins by specific recognition of the modified histone via protein domains specialized for such purposes, rather than through simply stabilizing or destabilizing the interaction between histone and the underlying DNA. These recruited proteins then act to alter chromatin structure actively or to promote transcription.

60. The hypothesis is that chromatin-DNA interactions are guided by combinations of histone modifications. While it is accepted that modifications (such as methylation, acetylation, ADP-ribosylation, ubiquitination, citrullination, and phosphorylation) to histone tails alter chromatin structure, a complete understanding of the precise mechanisms by which these alterations to histone tails influence DNA-histone interactions remains elusive.

61. Both lysine and arginine residues are known to be methylated. Methylated lysines are the best understood marks of the histone code, as specific methylated lysine match well with gene expression states. Methylation of lysines H3K4 and H3K36 is correlated with transcriptional activation while demethylation of H3K4 is correlated with silencing of the genomic region. Methylation of lysines H3K9 and H3K27 is correlated with transcriptional repression.

62. Acetylation tends to define the 'openness' of chromatin as acetylated histones cannot pack as well together as deacetylated histones.

63. Every nucleosome in a cell can therefore have a different set of modifications, raising the question of whether common patterns of histone modifications exist. A recent study of about 40 histone modifications across human gene promoters found over 4000 different combinations used, over 3000 occurring at only a single promoter. However, patterns were discovered including a set of 17 histone modifications that are present together at over 3000 genes. Therefore, patterns of histone modifications do occur but they are very intricate, and we currently have detailed biochemical understanding of the importance of a relatively small number of modifications. [Patterns of histone modifications do exist.]

Wikipedia “Histone Code”

64. A somatic epitype is a non-heritable epigenetic alteration in a gene. It is similar to conventional epigenetics in that it does not involve changes in the DNA primary sequence. Physically, the somatic epitype corresponds to changes in DNA methylation, oxidative damage (replacement of GTP with oxo-8-dGTP), or changes in DNA-chromatin structure that are not reversed by normal cellular or nuclear repair mechanisms. Somatic epitypes alter gene expression levels without altering the amino acid sequence of the expressed protein. Current research suggests that somatic epitypes can be altered both before and after birth. There is no indication that somatic epitypes are heritable in a conventional epigenetic fashion.

Wikipedia “Somatic Epitype”

65. We propose that primary gene sequence variation is often not the immediate operator in neurobiological pathology. Instead, environment acts on the genetic substrate, producing a 'somatic epitype'. It would be this somatic epitype that directly provides gene-based influence on neuropathological etiology. Somatic epitypes are a form of epigenotype that arises through environmental influences on a genome in a single lifetime rather than the more familiar epigenetic inheritance. These somatic epitypes would correspond to physical alterations in gene promoters, be that through (hypo)methylation, chromatin structure or oxidative damage (as oxo-d8-guanosine). This could be instilled on the underlying gene sequence by conditions in utero, by maternal behavior, and/or by maternal nutrition or post-natal environmental effects.

“Genes are not our destiny: the somatic epitype bridges between the genotype and the phenotype”

66. The "missing heritability" problem can be defined as the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on "the combined effect of all the genes in the background than on the disease genes in the foreground", or the role of genes may have been severely overestimated.

67. The 'missing heritability' problem was named as such in 2008 (after the “missing baryon problem” in physics). The Human Genome Project led to optimistic forecasts that the large genetic contributions to many traits and diseases (which were identified by quantitative genetics and behavioral genetics in particular) would soon be mapped and pinned down to specific genes and their genetic variants by methods such as candidate-gene studies which used small samples with limited genetic sequencing to focus on specific genes believed to be involved, examining the SNP kinds of variants. While many hits were found, they often failed to replicate in other studies.

68. The editorial board of Behavior Genetics noted, in setting more stringent requirements for candidate-gene publications, that "the literature on candidate gene associations is full of reports that have not stood up to rigorous replication...it now seems likely that many of the published findings of the last decade are wrong or misleading and have not contributed to real advances in knowledge". Other researchers have characterized the literature as having "yielded an infinitude of publications with very few consistent replications" and called for a phase out of candidate-gene studies in favor of polygenic scores.

This led to a dilemma. Standard genetics methods have long estimated large heritabilities such as 80% for traits such as height or intelligence, yet none of the genes had been found despite sample sizes that, while small, should have been able to detect variants of reasonable effect size such as 1 inch or 5 IQ points. If genes have such strong cumulative effects - where were they?

Wikipedia “Missing Heritability Problem”

69. Age-related macular degeneration proportion of heritability explained 50%. Crohn's disease proportion of heritability explained 20%. Systemic lupus erythematosus proportion of heritability explained 15%. Type 2 diabetes proportion of heritability explained 6%. Early onset myocardial infarction proportion of heritability explained 2.8%

70. Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained.

71. Many common human diseases and traits are known to cluster in families and are believed to be influenced by several genetic and environmental factors, but until recently the identification of genetic variants contributing to these ‘complex diseases’ has been slow and arduous HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2831613/" \l "R1"1. Genome-wide association studies (GWAS), in which several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed in thousands of individuals, represent a powerful new tool for investigating the genetic architecture of complex diseases HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2831613/" \l "R1"1, HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2831613/" \l "R2"2. In the past few years, these studies have identified hundreds of genetic variants associated with such conditions and have provided valuable insights into the complexities of their genetic architecture.

72. The underlying rationale for GWAS is the ‘common disease, common variant’ hypothesis, positing that common diseases are attributable in part to allelic variants present in more than 1–5% of the population. They have been facilitated by the development of commercial ‘SNP chips’ or arrays that capture most, although not all, common variation in the genome. Although the allelic architecture of some conditions, notably age-related macular degeneration, for the most part reflects the contributions of several variants of large effect (defined loosely here as those increasing disease risk by twofold or more), most common variants individually or in combination confer relatively small increments in risk (1.1–1.5-fold) and explain only a small proportion of heritability—the portion of phenotypic variance in a population attributable to additive genetic factors. For example, at least 40 loci have been associated with human height, a classic complex trait with an estimated heritability of about 80%, yet they explain only about 5% of phenotypic variance despite studies of tens of thousands of people.

73. The questions arise as to why so much of the heritability is apparently unexplained by initial GWA findings, and why it is important. It is important because a substantial proportion of individual differences in disease susceptibility is known to be due to genetic factors, and understanding this genetic variation may contribute to better prevention, diagnosis and treatment of disease. It is important to recognize, however, that few investigators expected these studies immediately to find all of the variants associated with common diseases, or even most of them; the hope was that they would at least find some. Limitations in the design of early GWAS, such as imprecise phenotyping and the use of control groups of questionable comparability, may have reduced estimates of effect sizes while preserving some ability to identify associated variants. These studies have considerably surpassed early expectations, reproducibly identifying hundreds of variants in many dozens of traits, but for many traits they have explained only a small proportion of estimated heritability.

74. Many explanations for this missing heritability have been suggested, including much larger numbers of variants of smaller effect yet to be found; rarer variants (possibly with larger effects) that are poorly detected by available genotyping arrays that focus on variants present in 5% or more of the population; structural variants poorly captured by existing arrays; low power to detect gene–gene interactions; and inadequate accounting for shared environment among relatives. Consensus is lacking, however, on approaches and priorities for research to examine what has been termed ‘dark matter’ of genome-wide association—dark matter in the sense that one is sure it exists, can detect its influence, but simply cannot ‘see’ it (yet).

75. It is reasonable to assume that allelic architecture (number, type, effect size and frequency of susceptibility variants) may differ across traits, and that missing heritability may take a different form for different diseases, but at present our understanding is too limited to distinguish these possibilities. Age-related macular degeneration may provide the best example of a common disease in which heritability is substantially explained by a small number of common variants of large effect, but for other conditions, such as Crohn’s disease, the proportion of heritability explained is not nearly so large despite a much larger number of identified variants. There are no obvious differences between these two traits in genetic architecture as predicted from clinical and epidemiological data that would explain the differences observed in their allelic architecture. Some apparent differences may simply be due to differences in the stage of investigation across traits. Studies in several conditions have clearly demonstrated that the number of detected variants increases with increasing sample size.

76. Immune and infectious agents have been recognized as among the strongest selection pressures in human evolution, and immune-related genes have been strongly implicated in Crohn’s disease and other immune-mediated diseases, suggesting either that pleiotropic effects of these variants reduce the efficiency of negative selection, or that strong environmental perturbation in modern societies might expose the disease risk associated with these variants. Selection may thus explain why disease allele frequencies are low and allelic effects are small, but this should manifest as low, rather than missing, heritability.

77. Structural variation, including copy number variants (CNVs, such as insertions and deletions) and copy neutral variation (such as inversions and translocations), may account for some of the unexplained heritability if those variants contribute to the genetic basis of human disease and are incompletely assessed by commercial SNP genotyping arrays. Although this type of variation has not been explicitly examined in most GWAS until now, CNVs in particular (regions 1 kilobase (kb) or longer present in variable numbers across individuals) have gained attention as methods to detect them have improved. Other forms of structural variation such as inversions, translocations, microsatellite repeat expansions, insertions of new sequence, and complex rearrangements have been implicated in rare Mendelian conditions. For the most part such variation has been largely unexplored in relation to complex traits.

78. CNVs arising de novo in current cases and shown to be of importance in neuropsychiatric and developmental conditions will not contribute to family resemblance and heritability, but could explain some of the variation at present attributed to ‘environment’.

79. The nearly 400 GWAS published so far represent a wealth of data on the genetics of complex diseases. These studies have provided valuable insights into the genetics of common diseases, particularly about the underlying genetic architecture of complex traits and the predominance of non-coding variants that may have a role in their aetiology. Just as linkage studies demonstrated that complex diseases cannot be explained by a small number of rare variants with large effects, GWAS have shown that they cannot be explained by a limited number of common variants of moderate effect.

80. Investigate gene–gene interactions, including dominance and epistasis.

81. Investigate gene–environment interactions: measure environment rigorously and analyse it against GWA data. [This and the one right above it are two “steps that can be used to make the most of existing and future GWAS”.]

82. Given all that has been learned of the genetic architecture of common diseases in the past few years, it may also be worthwhile to attempt exhaustive characterization of some well-studied traits by cataloguing all the contributing variation, be it in DNA sequence, DNA structure, chromatin structure, environmental modifiers, and defining all its functional implications.

83. Explaining missing heritability, however intellectually satisfying, will probably have fewer practical applications as an end in itself than as a means to an end. The ultimate goal of this line of research, as with nearly all research in the genetics of complex disease, is to improve understanding of human physiology and disease aetiology so that more effective means of diagnosis, treatment and prevention can be developed.

“Finding the missing heritability of complex diseases”

84. With the advent of Genome-Wide Association Studies (GWAS), estimates of the heritability of a trait can be based on the collection of Single Nucleotide Polymorphisms (SNPs) from populations of unrelated individuals. In order to estimate the narrow-sense heritability of a trait, these studies gather information of thousands of genetic variants and calculate the degree of relatedness between any two individuals through genetic identity. The narrow sense heritability (h2) is defined as the proportion of phenotypic variation that can be explained by genetic linear effects, and since GWAS associates individual SNPs it provides estimations of this type of heritability. As of today, we know more than 50,000 SNPs associated with many important human phenotypes. However, both individual and cumulative effects of these SNPs fall short of explaining the heritability of the phenotype they are associated with (Lee et al., HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5468393/" \l "B38"2011). For example, pedigree studies have shown that 80% of variation in human height comes from genetic effects. GWAS studies have found approximately 50 genetic variants that are associated with human height, but they are only able to explain 5% of height variation. This discrepancy between both measurements occurs in many human traits and is known as the missing heritability problem. [In 2015, it was over 80,000 SNPs.]

85. There are many possible explanations, and no consensus, as to where this missing heritability is hiding. Epigenetics, gene interactions, RNAs, heritability overestimations, small size effect variants, GWAS experimental limitations and many other factors have been proposed as possible reasons behind this problem. [It's not hiding. It doesn't exist.]

86. It has been shown that diet and exercise vary widely among groups of people and substantially impact the development of diseases such as obesity and diabetes (Pan et al., HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5468393/" \l "B49"1997). Habits may play an important role in the calculation of the heritability of these phenotypes, since people with similar behavior may be genetically distinct and yet express a close phenotypic resemblance. [It's not the habit itself. It's that they share a predisposition. Some people eat badly, but never become obese.]

87. GWAS also disregard epistasis (gene-gene interactions) and epigenetic effects. The study of gene regulatory networks has made clear that interactions across genes, proteins, RNA and other regulatory molecules are crucial to the generation and maintenance of specific gene expression patterns. These patterns, in turn, determine phenotypes. GWAS usually report individual SNPs associated with a specific trait, but SNPs can have combined effects that are not necessarily linear.

88. Methylation, acetylation, and mRNAs are known to influence gene expression (Delcuve et al., HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5468393/" \l "B17"2009), and this fact has critical consequences for the development of certain diseases.

89. The first estimates of the heritability of body mass index (BMI), taken from twins and family studies, were around 45%. More recent studies found heritability to be between 50 and 90% (Elks et al., HYPERLINK "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5468393/" \l "B21"2012) but when GWAS was used to search for single nucleotide polymorphisms correlated with BMI, the ones found were only able to explain 2% of the phenotypic variation. This result clearly exemplifies the missing heritability problem, given the huge gap between the two measurements. Obesity is without a doubt a multifactorial disease that involves genetic and environmental factors.

“The Human Microbiome and the Missing Heritability Problem”

90. Some researchers are now homing in on copy-number variations (CNVs), stretches of DNA tens or hundreds of base pairs long that are deleted or duplicated between individuals. Variations in these features could begin to explain missing heritability in disorders such as schizophrenia and autism, for which GWAS have turned up almost nothing. Two recent studies looked at hundreds of CNVs in normal people and in those with schizophrenia, and found strong associations between the disease and several CNVs. They commonly arise de novo — in an individual without any family history of the mutation.

These structural variants might account for a lot of the genetic variability from person to person and could account for some of those rare 'out-of-sight' mutations with moderate penetrance that GWAS can't pick up. Many CNVs go undetected because they don't alter SNP sequences. Duplicated regions can also be difficult to sequence.

91. Most genes work together with close partners, and it is possible that the effects of one on heritability cannot be found without knowing the effects of the others. This is an example of epistasis, in which one gene masks the effect of another, or where several genes work together. Two genes may each add a centimeter to height on their own, for example, but together they could add five. GWAS don't cope with epistasis very well.

92. The mechanisms by which epigenetic inheritance might work are still disputed, though; marks such as methylation that direct gene expression during someone's life seem to be wiped clean in a new embryo.

93. There is a nagging worry as researchers hunt for heritability: that common diseases might not, in fact, be common. Medicine tries hard to lump together a complex collection of symptoms and call it a disease. But if thousands of rare genetic variants contribute to a single disease, and the genetic underpinnings can vary radically for different people, how common is it? Are these, in fact, different diseases? [Further investigation of my work will explain this.]

“Personal genomes: The case of the missing heritability”

94. A transposable element (TE or transposon) is a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material.

95. McClintock found that genes could not only move, but they could also be turned on or off due to certain environmental conditions or during different stages of cell development.

96. Approximately 90% of the maize genome is made up of TEs, as is 44% of the human genome.

97. Class I TEs are copied in two stages: first, they are transcribed from DNA to RNA, and the RNA produced is then reversed transcribed to DNA. This copied DNA is then inserted back into the genome at a new position. The reverse transcription step is catalyzed by a reverse transcriptase, which is often encoded by the TE itself.

98. The most common transposable element in humans is the Alu sequence. It is approximately 300 bases long and can be found between 300,000 and one million times in the human genome. Alu alone is estimated to make up 15–17% of the human genome.

99. Mariner-like elements are another prominent class of transposons found in multiple species, including humans. This Class II transposable element is known for its uncanny ability to be transmitted horizontally in many species. There are an estimated 14,000 copies of Mariner in the human genome comprising 2.6 million base pairs.

100. TEs are mutagens and their movements are often the causes of genetic disease. They can damage the genome of their host cell in different ways: a transposon or a retrotransposon that inserts itself into a functional gene will most likely disable that gene; after a DNA transposon leaves a gene, the resulting gap will probably not be repaired correctly; multiple copies of the same sequence, such as Alu sequences, can hinder precise chromosomal pairing during mitosis and meiosis, resulting in unequal crossovers, one of the main reasons for chromosome duplication.

Diseases often caused by TEs include hemophilia A and B, severe combined immunodeficiency, prophyria, predisposition to cancer, and Duchenne muscular dystrophy. LINE1 (L1) TEs that land on the human Factor VIII have been shown to cause haemophilia and insertion of L1 into the APC gene causes colon cancer, confirming that TEs play an important role in disease development.

Additionally, many TEs contain promoters which drive transcription of their own transposase. These promoters can cause aberrant expression of linked genes, causing disease or mutant phenotypes.

101. It is unclear whether TEs originated in the last universal common ancestor, arose independently multiple times, or arose once and then spread to other kingdoms by horizontal gene transfer. While some TEs confer benefits on their hosts, most are regarded as selfish DNA parasites. In this way, they are similar to viruses.

Wikipedia “Transposable Element”

102. The initiating event leading to a change in gene expression includes activation or deactivation of receptors.

103. In eukaryotes, the accessibility of large regions of DNA can depend on its chromatin structure, which can be altered as a result of histone modifications directed by DNA methylation, ncRNA, or DNA-binding protein. Hence these modifications may up or down regulate the expression of a gene.

104. Octameric protein complexes called nucleosomes are responsible for the amount of supercoiling of DNA, and these complexes can be temporarily modified by processes such as phosphorylation or more permanently modified by processes such as methylation. Such modifications are considered to be responsible for more or less permanent changes in gene expression levels.

105. Methylation of DNA is a common method of gene silencing. DNA is typically methylated by methyltransferase enzymes on cytosine nucleotides in a CpG dinucleotide sequence (also called “CpG islands” when densely clustered).

106. Abnormal methylation patterns are thought to be involved in oncogenesis.

107. Often, DNA methylation and histone deacetylation work together in gene silencing. The combination of the two seems to be a signal for DNA to be packed more densely, lowering gene expression.

108. Regulation of transcription thus controls when transcription occurs and how much RNA is created. Transcription of a gene by RNA polymerase can be regulated by several mechanisms. Specificity factors alter the specificity of RNA polymerase for a given promoter or set of promoters, making it more or less likely to bind to them (i.e.,sigma factors used in prokaryotic transcription). Repressors bind to the Operator, coding sequences on the DNA strand that are close to or overlapping the promoter region, impeding RNA polymerase's progress along the strand, thus impeding the expression of the gene.

109. General transcription factors position RNA polymerase at the start of a protein-coding sequence and then release the polymerase to transcribe the mRNA. Activators enhance the interaction between RNA polymerase and a particular promoter, encouraging the expression of the gene. Activators do this by increasing the attraction of RNA polymerase for the promoter, through interactions with subunits of the RNA polymerase or indirectly by changing the structure of the DNA. Enhancers are sites on the DNA helix that are bound by activators in order to loop the DNA bringing a specific promoter to the initiation complex. Enhancers are much more common in eukaryotes than prokaryotes, where only a few examples exist (to date). Silencers are regions of DNA sequences that, when bound by particular transcription factors, can silence expression of the gene.

110. In vertebrates, the majority of gene promoters contain a CpG island with numerous CpG sites. When many of a gene's promoter CpG sites are methylated the gene becomes silenced. Colorectal cancers typically have 3 to 6 driver mutations and 33 to 66 hitchhiker or passenger mutations. However, transcriptional silencing may be of more importance than mutation in causing progression to cancer. For example, in colorectal cancers about 600 to 800 genes are transcriptionally silenced by CpG island methylation. Transcriptional repression in cancer can also occur by other epigenetic mechanisms, such as altered expression of microRNAs. In breast cancer, transcriptional repression of BRCA1 may occur more frequently by over-expressed microRNA-182 than by hypermethylation of the BRCA1 promoter.

111. After the DNA is transcribed and mRNA is formed, there must be some sort of regulation on how much the mRNA is translated into proteins. Cells do this by modulating the capping, splicing, addition of a Poly(A) Tail, the sequence-specific nuclear export rates, and, in several contexts, sequestration of the RNA transcript. These processes occur in eukaryotes but not in prokaryotes. This modulation is a result of a protein or transcript that, in turn, is regulated and may have an affinity for certain sequences.

112. Three Prime untranslated regions (3'-UTRs) of messenger RNAs (mRNAs) often contain regulatory sequences that post-transcriptionally influence gene expression. Such 3'-UTRs often contain both binding sites for microRNAs (miRNAs) as well as for regulatory proteins. By binding to specific sites within the 3'-UTR, miRNAs can decrease gene expression of various mRNAs by either inhibiting translation or directly causing degradation of the transcript. The 3'-UTR also may have silencer regions that bind repressor proteins that inhibit the expression of a mRNA.

113. The effects of miRNA dysregulation of gene expression seem to be important in cancer. For instance, in gastrointestinal cancers, a 2015 paper identified nine miRNAs as epigenetically altered and effective in down-regulating DNA repair enzymes.

114. The effects of miRNA dysregulation of gene expression also seem to be important in neuropsychiatric disorders, such as schizophrenia, bipolar disorder, major depressive disorder, Parkinson's disease, Alzheimer's disease and autism spectrum disorders.

115. Gene Regulation can be summarized by the response of the respective system:

Inducible systems - An inducible system is off unless there is the presence of some molecule (called an inducer) that allows for gene expression. The molecule is said to "induce expression". The manner by which this happens is dependent on the control mechanisms as well as differences between prokaryotic and eukaryotic cells.

Repressible systems - A repressible system is on except in the presence of some molecule (called a corepressor) that suppresses gene expression. The molecule is said to "repress expression". The manner by which this happens is dependent on the control mechanisms as well as differences between prokaryotic and eukaryotic cells.

Wikipedia “Regulation of Gene Expression”

116. DNA methylation is a process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription. DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis.

117. Two of DNA's four bases, cytosine and adenine, can be methylated. Cytosine methylation is widespread in both eukaryotes and prokaryotes, even though the rate of cytosine DNA methylation can differ greatly between species. Adenine methylation has been observed in bacterial, plant and recently in mammalian DNA, but has received considerably less attention.

118. In plants and other organisms, DNA methylation is found in three different sequence contexts: CG (or CpG), CHG or CHH (where H correspond to A, T or C). In mammals however, DNA methylation is almost exclusively found in CpG dinucleotides, with the cytosines on both strands being usually methylated. Non-CpG methylation can however be observed in embryonic stem cells and has also been indicated in neural development. Furthermore, non-CpG methylation has also been observed in hematopoietic progenitor cells, and it occurred mainly in a CpApC sequence context.

119. The DNA methylation landscape of vertebrates is very particular compared to other organisms. In vertebrates, around 60–80% of CpG are methylated in somatic cells and DNA methylation appears as a default state that has to be specifically excluded from defined locations. By contrast, the genome of most plants, invertebrates, fungi or protists show “mosaic” methylation patterns, where only specific genomic elements are targeted, and they are characterized by the alternation of methylated and unmethylated domains.

120. High CpG methylation in mammalian genomes has an evolutionary cost because it increases the frequency of spontaneous mutations.

121. Excluding repeated sequences, there are around 25,000 CpG islands in the human genome, 75% of which being less than 850bp long. They are major regulatory units and around 50% of CpG islands are located in gene promoter regions, while another 25% lie in gene bodies, often serving as alternative promoters. Reciprocally, around 60-70% of human genes have a CpG island in their promoter region. The majority of CpG islands are constitutively unmethylated and enriched for permissive chromatin modification such as H3K4 methylation. In somatic tissues, only 10% of CpG islands are methylated, the majority of them being located in intergenic and intragenic regions.

122. DNA methylation may affect the transcription of genes in two ways. First, the methylation of DNA itself may physically impede the binding of transcriptional proteins to the gene, and second, and likely more important, methylated DNA may be bound by proteins known as methyl-CpG-binding domain proteins (MBDs). MBD proteins then recruit additional proteins to the locus, such as histone deacetylases and other chromatin remodeling proteins that can modify histones, thereby forming compact, inactive chromatin, termed heterochromatin. This link between DNA methylation and chromatin structure is very important. In particular, loss of methyl-CpG-binding protein 2 (MeCP2) has been implicated in Rett syndrome; and methyl-CpG-binding domain protein 2 (MBD2) mediates the transcriptional silencing of hypermethylated genes in cancer.

123. While DNA methylation does not have the flexibility required for the fine-tuning of gene regulation, its stability is perfect to ensure the permanent silencing of transposable elements. Transposon control is one the most ancient function of DNA methylation that is shared by animals, plants and multiple protists.

124. In many disease processes, such as cancer, gene promoter CpG islands acquire abnormal hypermethylation, which results in transcriptional silencing that can be inherited by daughter cells following cell division. Alterations of DNA methylation have been recognized as an important component of cancer development.

125. Global hypomethylation has also been implicated in the development and progression of cancer through different mechanisms. Typically, there is hypermethylation of tumor suppressor genes and hypomethylation of oncogenes.

126. Generally, in progression to cancer, hundreds of genes are silenced or activated. Although silencing of some genes in cancers occurs by mutation, a large proportion of carcinogenic gene silencing is a result of altered DNA methylation.

127. Altered expressions of microRNAs also silence or activate many genes in progression to cancer. Altered microRNA expression occurs through hyper/hypomethylation of CpG sites in CpG islands in promoters controlling transcription of the microRNAs.

128. Silencing of DNA repair genes through methylation of CpG islands in their promoters appears to be especially important in progression to cancer.

129. Epigenetic modifications such as DNA methylation have been implicated in cardiovascular disease, including atherosclerosis. In animal models of atherosclerosis, vascular tissue as well as blood cells such as mononuclear blood cells exhibit global hypomethylation with gene-specific areas of hypermethylation. DNA methylation polymorphisms may be used as an early biomarker of atherosclerosis since they are present before lesions are observed, which may provide an early tool for detection and risk prevention.

Wikipedia “DNA Methylation”

130. The transcriptome is the set of all RNA molecules in one cell or a population of cells. It is sometimes used to refer to all RNAs, or just mRNA, depending on the particular experiment. It differs from the exome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities.

The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the genes that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation.

131. One analysis method, known as gene set enrichment analysis, networks rather than individual genes that are up- or down-regulated in different cell populations.

Wikipedia “Transcriptome”

132. The epigenome is involved in regulating gene expression, development, tissue differentiation, and suppression of transposable elements. Unlike the underlying genome which is largely static within an individual, the epigenome can be dynamically altered by environmental conditions.

Wikipedia “Epigenome”

133. One of the major players in cellular regulation are transcription factors, proteins that regulate the expression of genes. Other proteins that bind to transcription factors to form transcriptional complexes might modify the activity of transcription factors, for example blocking their capacity to bind to a promoter.

134. Signaling pathways are groups of proteins that produce an effect in a chain that transmit a signal from one part of the cell to another part, for example, linking the presence of substance at the exterior of the cell to the activation of the expression of a gene.

Wikipedia “Regulome”

135. Another extensively studied type of interactome is the protein–DNA interactome, also called a gene-regulatory network, a network formed by transcription factors, chromatin regulatory proteins, and their target genes.

136. All interactome types are interconnected. For instance, protein interactomes contain many enzymes which in turn form biochemical networks. Similarly, gene regulatory networks overlap substantially with protein interaction networks and signaling networks.

137. Genes interact in the sense that they affect each other's function. For instance, a HYPERLINK "https://en.wikipedia.org/wiki/Mutation"mutation may be harmless, but when it is combined with another mutation, the combination may turn out to be lethal. Such genes are said to "interact genetically". Genes that are connected in such a way form genetic interaction networks.

Wikipedia “Interactome”

138. The exome is the part of the genome composed of exons, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing and contribute to the final protein product encoded by that gene. It consists of all DNA that is transcribed into mature RNA in cells of any type, as distinct from the transcriptome, which is the RNA that has been transcribed only in a specific cell population. The exome of the human genome consists of roughly 180,000 exons constituting about 1% of the total genome, or about 30 megabases of DNA. Though composing a very small fraction of the genome, mutations in the exome are thought to harbor 85% of mutations that have a large effect on disease.

Wikipedia “Exome”

139. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation, and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins, and carbohydrates, constitute the four major macromolecules essential for all known forms of life. Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA it is more often found in nature as a single-strand folded onto itself, rather than a paired double-strand. Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the nitrogenous bases of guanine, uracil, adenine, and cytosine, denoted by the letters G, U, A, and C) that directs synthesis of specific proteins.

140. Some RNA molecules play an active role within cells by catalyzing biological reactions, controlling gene expression, or sensing and communicating responses to cellular signals.

141. Non-coding RNAs ("ncRNA") can be encoded by their own genes (RNA genes), but can also derive from mRNA introns. The most prominent examples of non-coding RNAs are transfer RNA (tRNA) and ribosomal RNA (rRNA), both of which are involved in the process of translation. There are also non-coding RNAs involved in gene regulation, RNA processing and other roles.

142. Several types of RNA can downregulate gene expression by being complementary to a part of an mRNA or a gene's DNA. MicroRNAs (miRNA; 21–22 nt) are found in eukaryotes and act through RNA interference (RNAi), where an effector complex of miRNA and enzymes can cleave complementary mRNA, block the mRNA from being translated, or accelerate its degradation.

143. While small interfering RNAs (siRNA; 20–25 nt) are often produced by breakdown of viral RNA, there are also endogenous sources of siRNAs. siRNAs act through RNA interference in a fashion similar to miRNAs. Some miRNAs and siRNAs can cause genes they target to be methylated, thereby decreasing or increasing transcription of those genes. Animals have Piwi-interacting RNAs (piRNA; 29–30 nt) that are active in germline cells and are thought to be a defense against transposons and play a role in gametogenesis.

144. There are many long noncoding RNAs that regulate genes in eukaryotes, one such RNA is Xist, which coats one X chromosome in female mammals and inactivates it.

145. An mRNA may contain regulatory elements itself, such as riboswitches, in the 5' untranslated region or 3' untranslated region; these cis-regulatory elements regulate the activity of that mRNA. The untranslated regions can also contain elements that regulate other genes.

146. RNA can also be methylated.

Wikipedia “RNA”

147. Their paper appeared in Nature Genetics on May 7, 2018. "If we want to understand how differences in DNA methylation patterns can cause developmental defects in plants, or diseases like cancer in humans, we need to understand how DNA methylation is targeted to specific regions of the genome under normal conditions," says Salk Assistant Professor Julie Law, senior author of the paper. "Until now, factors able to control methylation in such a precise manner have been elusive."

148. RNA polymerase IV (Pol-IV) makes small molecular messages called siRNAs that act like a molecular GPS system, indicating all the locations within the genome where methylation should be targeted. However, how this polymerase might be regulated to control DNA methylation at individual genomic locations was unclear.

To address this question, Law's lab used a combined genetic-genomic approach to investigate the functions of four related proteins, the CLASSY family, that they thought might regulate Pol-IV. It turned out that disruption of each CLASSY gene resulted in different sets of genomic regions -- in different locations -- losing their siRNA signals, resulting in reduced DNA methylation levels. More dramatically, when all four CLASSY genes were disrupted, the siRNA signals and DNA methylation were lost throughout the entire genome.

149. "In the CLASSY quadruple mutants, the Pol-IV signal completely disappears -- essentially no siRNAs are made," says Ming Zhou, a Salk research associate and the paper's first author. "This is very strong evidence that CLASSYs are required for Pol-IV function."

When Law's team probed further, they discovered that the DNA methylation defects in the CLASSY mutants caused some genes to be erroneously turned on and resulted in global decreases in methylation at mobile DNA elements, increasing their potential to move around and disrupt essential gene activity.

"The CLASSYs are a part of a large superfamily that is common to both plants and animals," adds Law, who holds the Hearst Foundation Development Chair. "We hope that by understanding how specific methylation patterns are generated in plants, we can provide insights into how DNA methylation is regulated in other organisms."

Knowledge of this mechanism for regulating DNA methylation could help scientists develop strategies for correcting epigenetic defects that are associated with reduced yields in crops, or diseases -- such as cancer -- in humans.

“Understanding how DNA is selectively tagged with 'do not use' marks”

150. Until human genes are activated, they are blocked by structures known as nucleosomes, components that serve to package DNA inside cells.

For the past several decades, scientists have been trying to determine how these nucleosome roadblocks clear out to allow genes to be turned on. Now, a team of scientists led by postdoctoral researcher Jia Fei in James Kadonaga's lab at the University of California San Diego has identified a key factor that partially unravels nucleosomes and clears the way for genes to activate.

151. The identification of "NDF," or nucleosome destabilizing factor, is described May 14 in the journal Genes & Development. The researchers say the finding provides a new perspective on how genes are turned on and off—knowledge useful in the study of human diseases such as cancer, which can be caused by improper gene activity.

152. Genes are special functional segments in our DNA, which is a long molecular chain of genetic instructions. When genes are turned on, an enzyme named RNA polymerase travels along the DNA and makes a working copy (RNA) of the DNA. Here, nucleosomes, which look like beads on the DNA chain, pose a problem as they block the passage of the polymerase. This led to the question: How is the polymerase able to travel through nucleosomes?

The answer emerged with the identification of NDF, which destabilizes nucleosomes and enables the progression of the polymerase. The researchers say NDF's makeup suggests that it is broadly used in perhaps all human cells and may play a role in disease.

153. "NDF is present at abnormally high levels in breast cancer cells, and the overproduction of NDF might be partly responsible for the uncontrolled growth of these cells," said Kadonaga, Distinguished Professor of Molecular Biology and the Amylin Endowed Chair in Lifesciences Education and Research. "Thus, the identification of NDF resolves an old mystery and reveals a new factor that may have an important role in many aspects of human biology."

“Scientists find missing factor in gene activation”

154. A non-coding RNA (ncRNA) is an RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAS, exRNAs, scaRNAs, and the long ncRNAs such as Xist and HOTAIR.

The number of non-coding RNAs within the human genome is unknown; however, recent transcriptomic and bioinformatic studies suggest that there are thousands of them. Many of the newly identified ncRNAs have not been validated for their function. It is also likely that many ncRNAs are non functional (sometimes referred to as junk RNA), and are the product of spurious transcription.

Non-coding RNAs contribute to diseases including cancer and Alzheimer's.

155. Y RNAs are stem loops, necessary for DNA replication through interactions with chromatin and initiation proteins (including the origin recognition complex). They are also components of the Ro60 ribonucleoprotein particle, which is a target of autoimmune antibodies in patients with systemic lupus erythematosus.

156. The expression of many thousands of genes are regulated by ncRNAs. This regulation can occur in trans or in cis. There is increasing evidence that a special type of ncRNAs called enhancer RNAs, transcribed from the enhancer region of a gene, act to promote gene expression.

157. In higher eukaryotes microRNAs regulate gene expression. A single miRNA can reduce the expression levels of hundreds of genes. The main function of miRNAs is to down-regulate gene expression.

158. The ncRNA Rnase P has also been shown to influence gene expression. In the human nucleus Rnase P is required for the normal and efficient transcription of various ncRNAs transcribed by RNA polymerase III. These include tRNA, 5S rRNA, SRP RNA, and U6 snRNA genes. RNase P exerts its role in transcription through association with Pol III and chromatin of active tRNA and 5S rRNA genes.

159. A number of ncRNAs are embedded in the 5' UTRs (Untranslated Regions) of protein coding genes and influence their expression in various ways.

160. Non-coding RNAs are crucial in the development of several endocrine organs, as well as in endocrine diseases such as diabetes mellitus.

161. As with proteins, mutations or imbalances in the ncRNA repertoire within the body can cause a variety of diseases.

162. A screen of 17 miRNAs that have been predicted to regulate a number of breast cancer associated genes found variations in the microRNAs miR-17 and MiR-30c1 of patients; these patients were noncarriers of BRCA1 or BRCA2 mutations, lending the possibility that familial breast cancer may be caused by variation in these miRNAs.

163. The p53 protein functions as a transcription factor with a crucial role in orchestrating the cellular stress response. In addition to its crucial role in cancer, p53 has been implicated in other diseases including diabetes, cell death after ischemia, and various neurodegenerative diseases such as Huntington, Parkinson, and Alzheimer. Studies have suggested that p53 expression is subject to regulation by non-coding RNA.

164. The chromosomal locus containing the small nucleolar RNA SNORD115 gene cluster has been duplicated in approximately 5% of individuals with autistic traits. A recent small study of post-mortem brain tissue demonstrated altered expression of long non-coding RNAs in the prefrontal cortex and cerebellum of autistic brains as compared to controls.

165. The antisense RNA, BACE1-AS is transcribed from the opposite strand to BACE1 and is upregulated in patients with Alzheimer's disease. BACE1-AS regulates the expression of BACE1 by increasing BACE1 mRNA stability and generating additional BACE1 through a post-transcriptional feed-forward mechanism. By the same mechanism it also raises concentrations of beta amyloid, the main constituent of senile plaques. BACE1-AS concentrations are elevated in subjects with Alzheimer's disease.

166. Variation within the seed region of mature miR-96 has been associated with autosomal dominant, progressive hearing loss.

Wikipedia “Non-coding RNA”

167. DNA-binding proteins include transcription factors which modulate the process of transcription, various polymerases, nucleases, which cleave DNA molecules, and histones which are involved in chromosome packaging and transcription in the cell nucleus.

168. Within chromosomes, DNA is held in complexes with structural proteins. These proteins organize the DNA into a compact structure called chromatin. In eukaryotes, this structure involves DNA binding to a complex of small basic proteins called histones. The histones form a disk-shaped complex called a nucleosome, which contains two complete turns of double-stranded DNA wrapped around its surface. These non-specific interactions are formed through basic residues in the histones making ionic bonds to the acidic sugar-phosphate backbone of the DNA, and are therefore largely independent of the base sequence. Chemical modifications of these basic amino acid residues include methylation, phosphorylation, and acetylation. These chemical changes alter the strength of the interaction between the DNA and the histones, making the DNA more or less accessible to transcription factors and changing the rate of transcription.

169. Transcription factors are proteins that regulate transcription. Each transcription factor binds to one specific set of DNA sequences and activates or inhibits the transcription of genes that have these sequences near their promoters. The transcription factors do this in two ways. Firstly, they can bind the RNA polymerase responsible for transcription, either directly or through other mediator proteins; this locates the polymerase at the promoter and allows it to begin transcription. Alternatively, transcription factors can bind enzymes that modify the histones at the promoter. This alters the accessibility of the DNA template to the polymerase.

These DNA targets can occur throughout an organism's genome. Thus, changes in the activity of one type of transcription factor can affect thousands of genes.

170. Protein–DNA interactions occur when a protein binds a molecule of DNA, often to regulate the biological function of DNA, usually the expression of a gene. Among the proteins that bind to DNA are transcription factors that activate or repress gene expression by binding to DNA motifs and histones that form part of the structure of DNA and bind to it less specifically. Also proteins that repair DNA such as uracil-DNA glycosylase interact closely with it.

Wikipedia “DNA-binding Protein”

The excerpts from these twenty three articles, which are all available online, explain the mechanisms through which the Other Architect brings physical phenotypes and disease phenotypes into existence. You'll notice a few things while you read them. One thing you'll notice is the strange belief that everything is hereditary. Where there is no heredity, they say the heredity is “missing,” “hiding,” or “poorly understood.” This means they are doing things backwards. They are starting with a conclusion and expect everything to prove the conclusion is correct. They want the non-genetic factors mentioned in excerpt 15 to be hereditary even though they aren't. They call something inheritance in excerpts 20, 21, and 23 which isn't inheritance. The reason most of the expected heredity is missing or poorly understood is because something other than heredity explains why people have similar gene expression.

Another thing you'll notice is that no one has any idea what orchestrates and directs all of these mechanisms. They mention that “environmental factors,” by which they mean random environmental factors, can cause changes in these mechanisms. They speculate on how the mechanisms evolved. From what they focus on and speculate about, it's obvious that their stance is one in which the belief that everything about life has already been figured out and that all the major discoveries have already been made governs how they interpret any information they come across. This stance is a hindrance to science. It's also an anachronism. The discoveries of all the great scientists should have removed this kind of thinking, especially in the scientific community. When someone takes this stance, that person is taking the stance of everyone who opposed all the great scientists instead of taking the stance that the scientists themselves had.

Excerpt 40 mentions things which were found to have no influence on genetic change over time and it's clear that random environmental influences aren't as important as we'd like them to be. There's something else that predisposes genes to undergo specific changes. When you read excerpt 5, you wonder to yourself, “What orchestrates all of this?” In excerpt 58, the question (that will be answered in the next two articles of this series) is raised, “How does the epigenetic signal start off?”

Excerpt 63 shows that one of these mechanisms has patterns. Further investigation will show that all of the mechanisms have patterns and that these patterns correspond to patterns in both physical and disease phenotypes. These patterns will also be found to be the result of predispositions. Until we collectively go in this direction, we'll run into results similar to those mentioned in excerpt 67.

Consensus on research approaches and priorities is what is lacking, according to excerpt 74. What is needed is a new approach and a new priority, and this is exactly what this series of eight articles provides. Their goal is the same as the goal described in excerpt 83.

Excerpt 86 states that habit may play an important role, because genetically distinct people with similar habits can express similar phenotypes, specifically eating and exercise habits; but obesity and diabetes occur in people with a predisposition. There are people who overeat and who never become diabetic or obese, just as there are people who smoke cigarettes for their entire lives and never develop cancer.

What we read in excerpts 110 and 113 lets us know that a connection exists between the Other Architect and cancer.

What we read in excerpt 114 lets us know that a connection exist between the Other Architect and neuropsychiatric disorders.

In excerpts 131 and 138, the genetic predispositions discussed in articles two through five of this series are mentioned. In these excerpts, they are called “coregulated gene networks” and “genetic interaction networks.” All the questions that are raised in these excerpts (excerpt 93, for example) will be answered within our lifetimes. What hinders us is the belief that everything is already known and that all the important discoveries have already been made. The new areas of research opened up by this series of articles will break down the barriers that prevent us from answering these questions. Discoveries lead to more discoveries, and as long as we don't allow ourselves to fall for the trap of thinking we already know everything, the discoveries we make will do just that.

The next two articles will discuss what initiates the mechanisms presented to you in this article. They will take the same form as this article – excerpts and discussion.

The Other Architect is made up of all the mechanisms which regulate gene expression and everything which activates and directs these mechanisms. As you have seen, the mechanisms which regulate gene expression have been studied, and continue to be studied, rather thoroughly. What initiates these mechanisms has also been studied thoroughly. However, no one has seen the connection between the two. Once we see the connection and fully understand it, through investigation and experimentation, we'll finally have a workable understanding of life – one that will help us to prevent genetic disorders like cancer and diabetes, and that will also enable us to survive in extraterrestrial environments.

What initiates these mechanisms is neither hereditary or random. There are some random environmental factors (diet, stress, chemicals, etc.) which affect us genetically, but these are not part of the Other Architect. The Other Architect does its work at very specific times and its work has specific patterns. The Other Architect is its own environment, and very specific events in that environment cause specific genetic interaction networks (genetic predispositions) to form. The formations of these genetic interaction networks occur at two very specific times.

The first time is in a very specific part of the environment, which could be called the microenvironment. The second time is when the organism leaves the microenvironment and enters what could be called the macroenvironment. There are regular, steady events in both the microenvironment and in the macroenvironment which initiate the formation of genetic interaction networks.

Knowledge of these events, and the genetic interaction networks they initiate, is what is missing in our current understanding of life. Our lack of knowledge is the reason why so many of our experiments fail to answer our questions. We've been doing them incorrectly. We've been studying life under the assumption that we already know, and have discovered, everything essential about it there is to know. The fact that every study we do, and that every experiment we do, doesn't confirm our assumptions about life should have at least made us wonder if there is something essential about life that we don't know. Any of the studies and experiments which didn't confirm our assumptions could have led the persons doing them to probe even deeper into the studies and experiments. By doing so, they may have discovered something about life that, up to that point, was unknown. What stopped them was a belief, an assumption.

Science, at least to the scientists who made the discoveries which inspire us all, is the freedom from belief and assumption. It's something that anyone can do, and it's something everyone does until they become convinced that belief and assumption are somehow a better choice. What we are going to collectively discover through further investigation of what is being presented in this series of articles is going to challenge our beliefs and assumptions. It is going to challenge our habit of clinging to beliefs and assumptions.

Many people reading these articles have already made assumptions. None of the assumptions that have been shared with me have been correct. I mention this because you may have made the same assumptions, or run into the same assumptions when you told someone about what you've been reading in these articles.

What you'll be presented with in the next two articles will dispel some of these assumptions. The rest will be dispelled during the verification of all of these findings. The majority of our studies and experiments in the life sciences have been done incorrectly, partially because of the way they were done (starting with a conclusion) and partially because there were essential factors that weren't considered (because their existence was unknown). All of the studies and experiments that were done incorrectly can be redone correctly once these findings are verified. When studies and experiments are done correctly, their results add to our knowledge and this leads to more discoveries and to practical applications. The practical applications that are going to result from the much more complete understanding that we are going to have of life are going to be far greater than those we have that have come from our partial understanding of life, and all of this begins with our knowledge and understanding of the Other Architect.