History paper footnotes: CRM Prediction and CRM Validation Approaches

CRM Prediction and CRM Validation ApproachesSince CRM is underlying the regulation of constituent reflectivity in tissue-specific manner, understanding the characteristics of CRMs is helpful to steady down the potential CRM good dealdidates for further applications such as tissue-specific component therapy. As previously discussed the influential parameters to CRM activity include the types and arrangement of transcription divisor screening sites (TFBSs) and epigenetic accommodation pattern121, 124. Therefore, these factors are taken into account for prediction of promise CRMs.Transcription factor binding sites are described as short desoxyribonucleic acid regions (6 to 10 bp in length) which are recognized and bound by various transcription factors149. One CRM can contain umpteen TFBSs depended on its functionality150. Several experimental studies have been performed in order to map the TFBSs in desoxyribonucleic acid genome. Chromatin immunoprecipitation (ChIP) assay is a common rule to identify the TFBSs in protein-bound DNA complexes in the solution151, 152. In addition, DNase footprinting, which relies on the digestion of exposed DNA region where it is not protected by target proteins, has too been used153, 154. The remnant between these techniques is mainly involving resolution of transcription factor binding sites155, 156. To derive the TFBS motifs from raw data, these DNA orders are used as the input to compute the similarity and the potential motifs are generated. To apply the information of transcription factor binding sites motifs on CRM prediction, it is relatively simple as this method requires solely genomic DNA sequences. The predicted motifs are mapped to the original genome and prospective CRMs containing clusters of TFBSs are identified124, 157. Due to the enormous disseminate of motifs in large genome, a lot of DNA regions sharpening the potency of being CRMs are indicated however, only few DNA sequences are substantially as siduous by the target transcription factors158. In the erythroid cellular telephones of mouse genome showed approximately 8 million hits of GATA-binding factor1 (GATA1) binding site motifs, but only 15360 motifs were bound by GATA1 and all of bound motifs bore H3K4 monomethylation159. Indeed, relying on merely TFBS motifs is not sufficient to obtain the significant CRMs. The study on smaller-size genomes is one alternative to improve the whole step of CRM prediction.157Another approach to determine the potential CRMs is the use of conservation of non-coding DNA among several species. The assumption is that the DNA sequences associate with gene expression are extremely conserved in comparison to non-essential DNA after evolving through the purifying selection over time 157. This method is not depended on the information on TFBS so that it offers another solution to prediction of CRMs in case tissue-specific evokers have not been widely studied. At initial study about the DNA sequ ence alignment of more than 100 bp-long DNA between human and mouse, with the minimal conservation of 70%, was conducted and potential enhancers for trusted genes such as interleukin-4, interleukin-13 and interleukin-5 were identified160. Later on this approach shows the promising results due to high constitution rates in transgenic mouse embryo by using rigorous conservation constrain160-163. The conservation- base prediction is as well applicable to discover novel TFBSs where the information is not extensively elaborated. With the DNA sequence alignment between orthologous species, the short DNA sequences conserved in many species, namely phylogenetic footprints, could be the possible binding sites for transcription factors 164, 165, and mutations of the conserved boxes can hold up to the reduction of gene expression as in the good example of altered effect of variant E box on -globin reporter gene induction166. As the approach is mainly related to the evolutionary constrain among species it marrow that the use of this method may overlook the potential CRMs which are lately developed and the TFBS pattern cannot be aligned to the former population157. For example, in the ChIP-seq study the GHP68 enhancer, placed at intragenic region of mouse abhydrolase domain containing2 (Abhd2) gene, does not contain the footprint of GATA-binding factor1 (GATA1) motif which is unremarkably found in Abhd2 genes of other non- hierarch species167. Indeed, the GHP68 enhancer in primate genome possesses the unique protein binding pattern157. Another consideration on conservation-based prediction is that even though the conservation level of selected CRMs is extremely high among orthologous species, the actual activities of CRMs possibly transfer from species to species in nature168.Due to the limitations of previous approaches regarding false positive prediction by highly redundant presence of TFBS motifs in large genome158, as well as lineage-specific evolution of cert ain CRMs in different organisms157, epigenetic regulation is considered the promising parameter of CRM prediction as a result of the strong correlation between hypersensitivity to DNA treatment/histone adaption and enhancer activity169-171. Many CRMs have been found to localize at genome region where the response to DNase activity is very sensitive153, 172. In addition biochemical patterns of modification at enhancer are showed including histone acetylation169, high H3K4me1 as well as low H3K4me3 modification170, and military control of histone acetyltransferase p300171, 173. For active promoter, in contrast to usual enhancers, the major characteristic is the presence of nucleosome-free and high level of H3K3me3 modification174, 175. By using the reference genome database containing epigenetic as well as DNase hypersensitivity regions, where the information is obtained from ChIP seq 176, and DNase seq experiments, the substantial rate of validation of selected CRMs from 43 to 100 % in many study models169-171, 176, 177 indicates the robustness of the epigenetic-based approach. The idea is this method is optimized that the predicted conditions is not too stringent as evolutionary conservation method and the effect of output is not too enormous as TFBS-based prediction157. Still, some potential CRMs can be overlooked using biochemical features173, 178. For instance, the study of oculus enhancer identification showed that three different predictions yielded various amount of outputs. The possible CRMs were hardly obtained through comparative genomic DNA alignment while the use of p300 occupancy to identify the potential sequences gave rise to 130 output sequences with 75% validation rate173. In another TFBS-based study in heart by Narlikar and colleagues, the classifier, where its database relied on predicted and validate TFBS, was generated to select the putative CRMs from the non-functional DNA178. This prediction allowed them to distinguish 40,000 CRMs fro m genome and the validation rate was relatively considerable in comparison to the epigenetic approach178. This suggests the need of additional further study on biochemical pattern prediction to cover the missing CRMs.Using experimental and computational study, scientists are able to collect the extensive information about TFBSs, epigenetic modification and conservation of DNA among species. This data has been widely deposited in many open-access database websites, which become the significant information resources for further CRM identification179. The Ensembl Regulatory Build is recently developed to mingle the previous discovery of epigenetic marks and occupancy of transcription factors from different projects and build the better-defined regulatory regions in human genome180. Another commonly used database website is the University of California Santa Cruz (UCSC) Genome web browser Database, which provide all aspects of information for CRM prediction including experimental (DNas e hypersensitivity clusters, epigenetic marks of histone proteins, and binding of transcription factors from ChIP seq) as well as computational (conservation level among vertebrates from DNA sequence alignment) study181. This aids the feasibility of enhancer prediction since the use combinatorial information would suggest more significant CRM outputs with higher validation rate182-184. For example, the sophisticated protocol designed by Nair and team to identify the liver-specific CRM was derived from the integration of experimental study from UCSC genome browser and the putative TFBS motifs from computational analysis182. To obtain predicted liver-specific TFBS motifs, the presumptive promoters, which are 1000-bp DNA sequences set(p) upstream of transcription start sites, from highly-expressed genes were initially compared to ones from low-expressed genes in the liver, followed by computing the potential TFBS motifs which are likely to associate with liver-targeted gene induction based on distance difference matrix (DDM) and multidimensional scaling (MDS)182, 185. The DDM was primarily used to identify the difference between two protein structures by calculating the distance difference values from low distance matrices186. Ultimately the predicted TFBS motifs were mapped to the corresponding DNA sequences of liver-specific genes in UCSC genome browser where the experimental data of such genes was previously described182. The ideal CRMs were expected to show the coexistence of predicted motifs together with dense DNase clusters, high conservation level in vertebrates, and explicit histone modification patterns. In addition, the putative motifs should be consistent to the transcription factor lists from ChIP-seq experiment. The promising liver-specific transcriptional module from prediction was further validated and showed the remarkable activity to up-regulate hFIX expression up to 15 fold compared to control, reflecting the robustness of the prediction metho d182. The same approach has also been applied to design the CRMs targeting other target cells such as cardiomyocytes, and the 10-fold augmented expression of cardiac genes was noted upon validation in mouse model183. Taken together, this suggests the increase power of using multiple parameters to determine transcriptional modules, and the combined data provided in UCSC genome browser is valid the integrated data is nicely standardized so that the abbreviation of information is reliable. However, the feasibility of combinatorial approach, relying on both computational data and previous experimental study, is the major concern due to the requirement of strong expertness on bioinformatics knowledge for computation of TFBS motifs. One possible alternative to circumvent this limitation would be the direct use of available information on UCSC Genome web browser for CRM selection by taking associated determinants (DNase hypersensitivity, transcription factor binding, histone modification, and conservation level among vertebrate) into consideration.There are several validation assays that have been performed to investigate the potency of CRMs to enhance gene expression. In general, the plasmids containing minimal core promoters and reporter genes such as lacZ, encoding -galactosidase, luciferase, and green fluorescence protein (GFP), are the backbone constructs, and the predicted CRM are cloned into certain position based on the validation methods149. Usually CRM sequences are inserted at the upstream of the promoters and the increased strength of overall construct expression is assessed after transfection or integration of plasmids187-196. In order to develop the downstream process to identify the target cells where CRMs are active, the use of heterologous barcode has been done so that the number of CRM high-throughput screening is up to hundreds or thousands 191-194, 196. In some studies, the need of barcode is eliminated by targeting at enhancers directly, and the method is called self-transcribing active regulatory region sequencing (STARR-seq) 197. Both transgenic animal embryos and specific cell lines 187-191, 193-196 are commonly used to study CRM activity. For example, transgenic mouse or fly (D.melanogaster) containing putative CRMs as well as reporter genes are initially generated, and the development of reporter gene signals later observed at the certain parts of embryos is identified depended on tissue specificity of CRMs198. To improve time and cost-effectiveness of the current approach, Gisselbrecht and colleagues developed the technique called enhancer-FACS-Seq (eFS), which makes use of the statistical distribution of GFP signaling based on the tissue-specific CRM enhancement, to sort out the GFP-positive cells from the negative population using fluorescence activated cell sorting (FACS)190. Validation of the effect of CRMs on gene expression has also been reported in animal models and the delivery methods of CRMs are adjusted to be tissue-specific. AAV is the example of tissue-targeted delivery system since its tropism is relied on the serotype182-184. The use of AAV vectors to pay the predicted CRMs to the specific organs has been done in heart and liver enhancers by using AAV9, and the follow-up process was achieved through the reporter hFIX protein expression in the blood. In murine models, to reduce the cost of virus production, HD injection of plasmids containing CRMs in mice can be primarily done for initial screening182. This method is distinctive since the model simulates the actual situation of CRM activity in animal body for gene therapy application182-184. In addition, another advantage of using this approach is the longevity and the expression level can be observed continuously for long-term study as the mouse sacrifice is not required.Biology of hepatocellular carcinoma (HCC)Hepatocellular carcinoma (HCC) is one type of liver pubic louses which is highly prevailing in many regions such as E ast Asia, Africa, and United State199. Even though the incidence of HCC ranks the sixth in comparison to other cancers the rate of deathrate is relatively high200. There are several etiological factors describing HCC development including Hepatitis B (HBV) and C (HBC) infection, aflatoxin-directed induction, alcohol consumption, accumulation of fat in the liver resulting in non-alcoholic steatohepatitis (NASH), sex-related influence, unbalance of microbes in gastrointestinal tract, and type II diabetes201. Each factor has specific mechanism to cause HCC, but in general most of factors ultimately lead to liver cirrhosis formation and subsequently HCC202. A number of staging system to classify HCC disease development stage have been designed for diagnosing however, the gold-standard for staging remains challenging due to heterogeneity of HCC population203.To study the molecular mechanism underlying HCC development, copy number genomic204-206, exomic207, 208, whole-genome sequencing20 9, 210, and transcriptomic211, 212 studies have been conducted in liver cancer tissues. In copy number alteration analysis, both deletion (i.e. TNFAIP3, CDKN2C, WRN, PTEN, BRCA2) and duplication (MDM4, BCL9, ARNT, MET) of specific genes are found in HCC genomes213. Exome and whole-genome sequencing in HCC allow detailed investigating of genome structures at the levels of mutation in both coding and non-coding regions213, 214. For example, mutation of NFE2L2-KEAP1 and MLL genes were identified from 87 cases with HCC development using exomic approach214. Transcriptomic study gives another insight into HCC regarding the change of expression profiling compared to normal hepatocytes. Using in combination with whole-genome sequencing, transcriptome revealed the RNA editing mechanism implicating in up-regulation of gene expression in cancer development215, 216. Taken together, the aberrant genes found in HCC are mapped to cellular piece of lands to explain the molecular mechanisms under lying disease development. The pathways which are postulated as the keys for hepatocarcinogenesis include cell cycle regulation (i.e RB217, CDKN2A218), WNT pathway (i.e. APC219, AXIN1220, 221), chromatin remodeling (i.e. ARID2208, 210, MLL222), tyrosine kinase signaling (i.e. SOCS-1223, IGF224), and NOTCH225, 226 pathways.Apart from structural genes, miRNAs, small non-coding RNAs which control gene expression at post-transcriptional level through hybridization with the mRNA templates and subsequently star(p) to translation inhibition or RNA degradation227, are implicated in HCC progression due to the evidences on differential miRNA expression between HCC and normal hepatocytes228, 229. In general, miR-92, miR-18 and miR-20 are significant in HCC stage progression229. Some altered miRNA expression is associated with etiological factors. ForMC1 instance, there is correlation between miR-126 down regulation and alcohol consumption230. The functions of miRNA in HCC pathogenesis are div ided into two groups oncogenic miRNAs and tumor- curtailor miRNAs. For oncogenenic miRNAs, three miRNAs including miR-221, miR-224 and miR-21 have been showed to enhance hepatocarcinogenesis. The miR-221 plays role in cancer invasion using two mechanisms increasing cell proliferation targeting CDKN1B/p27 expression231, and enhancing cell migration through AKT signaling232. The invasion of HCC is also supported by miR-224, but its mechanism of action is involved with homeobox D10 downregulation and induction of inflammatory pathway233. Another oncogenic miRNA miR-21 is reported to suppress expression of program cell death 4 (PCD4) 234, 235protein which functions as tumor suppressor protein, and to increases cell proliferation through the regulation of mitogen-activated protein kinase-kinase 3 (MAP2K3) activity236. Apart from individual miRNAs, certain clusters of miRNA have been identified to contribute to HCC progression. For instance, the up-regulation of miR-17-92 cluster, which i s composed of miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1, and miR-92a-1237, was found in HCC, and the attenuation of its expression diminished the ability of malignancy transformation238. The activity of miR-17-92 cluster affects the expressions of certain genes usually found in HCC such as PTEN, E2F1, and E-cadherin239. However, the individual miRNA members may function in the different ways. For example, up-regulation of miR-19 suppressed the formation of liver fibrogenesis through TFF- signaling240. A number of tumor suppressive miRNAs have also been discovered to diminish HCC development. The miR-122 function is to control the genes associated with tumor formation and metastasis including VEGF241, RHOA241, PKM242 whereas miR-375 exerts its activity by suppression of ATG7 expression to evade autophagy243, the essential mechanism of cancerous cells to survive under hypoxic environment. The miR-125b prevents cancer proliferation by activation of p21(WAF1/Cip1) G1/S cell cycle a rrest as well as repression of SIRT7 gene induction244. G1/S transition of cancer cells is also controlled by miR-26a activity235. The overall functions of HCC-associated miRNAs are implicated in STAT3, by modulating Bcl-2 and Mcl-1 functions, and NF-B inflammatory pathways, leading to hepatocacinogenesis245.

History paper footnotes

Monday, June 3, 2019

CRM Prediction and CRM Validation Approaches

No comments:

Post a Comment