BICOB 2020:Papers with Abstracts

Abstract. Comparison between tumoral and healthy cells may reveal abnormal regulation behaviors between a transcription factor and the genes it regulates, without exhibiting differential expression of the former genes. We propose a methodology for the identification of transcription factors involved in the deregulation of genes in tumoral cells. This strategy is based on the inference of a reference gene regulatory network that connects transcription factors to their downstream targets using gene expression data. Gene expression levels in tumor samples are then carefully compared to this reference network to detect deregulated target genes. A linear model is finally used to measure the ability of each transcription factor to explain these deregulations. We assess the performance of our method by numerical experiments on a public bladder cancer data set derived from the Cancer Genome Atlas project. We identify genes known for their implication in the development of specific bladder cancer subtypes as well as new potential biomarkers.
Abstract. Gene interactions play a fundamental role in the proneness to cancer. However, detect- ing and ranking these interactions is a complex problem due to the high dimensionality of genomic data. Hence, we aim to find patterns composed of multiple features to molecularly characterize breast cancer subtypes from the integration of different omics datasets using a data mining approach. To retrieve biological understanding from these computational results, we developed IBIF-RF (Importance Between Interactive Features using Random Forest), a new metric capable of assessing and holistically ranking the importance of genomic interactions without any prior knowledge of key feature combinations. A set of 247 top-performing features from transcriptomic, proteomic, methylation, and clinical data were used to investigate interactive patterns to classify breast cancer subtypes us- ing over 1150 samples. IBIF-RF metric allowed the extraction of 154312, 190481, and 463917 combinations of variables for TCGA, GSE20685, and GSE21653 datasets. Single genes, MLPH and FOXA1, were the most frequently identified variables across all datasets followed by some two-gene interactions such as CEP55-FOXA1 and FOXC1-THSD4. More- over, IBIF-RF metric allowed the definition of two sets of genes frequently found together (1: FOXA1, MLPH, and SIDT1, and 2: CEP55, ASPM, CENPL, AURKA, ESPL1, TTK, UBE2T, NCAPG, GMPS, NDC80, MYBL2, KIF18B, and EXO1).
Abstract. Health problems are increasing worldwide pertaining to cancer modalities.
Cyclooxygenase enzyme is known to be involved in cancer biology, neurological disorders, cardiovascular and other diseases. It has been a promising target for developing novel anti-inflammatory drugs in breast cancer treatment. Hence, a computer-aided drug design strategy was applied to identify potent inhibitors of the COX-2 receptor. For this purpose, 12084 ligands from different databases to be tested based on similarity search criteria and were docked against our target protein COX-2 retrieved from the protein data bank. The high-throughput virtual screening protocol was performed and examined the compounds for its binding free energies. Eleven compounds were found out with better binding affinity by virtual screening results and showed interaction with the protein at the known active site. The selected compounds filtered through the Lipinski’s rule of five. The physicochemical properties and bioactivity scores were calculated. Molecular docking calculations, MD simulations, ADMET properties, and protein-ligand interaction were analyzed to determine the suitability of each ligand. Overall, the results from our study suggest that compound ZINC000039428234 could be a potent inhibitor for the COX-2 protein of breast cancer. We look forward to this result is of the enormous key in designing a potential drug candidate for breast cancer.
Abstract. Breast cancer is the second leading cause of cancer death for women worldwide. In this study, a previously published mathematical model of breast cancer in MCF-7 cell line is considered. The interaction among tumor cells, estradiol, natural killer (NK) cells, cytotoxic T lymphocytes (CTLs) or CD8+ T cells, and white blood cells (WBCs), is described by ordinary differential equations (ODEs). The system exhibits three coexisting stable equilibrium points which resemble the 3 E’s (elimination, equilibrium, and escape) of cancer immunoediting. In this paper, a numerical method based on adaptive grid method is employed for bifurcation analysis of the mathematical model. Bifurcation analysis is performed for some important parameters for which changes in value result in changes in the stability of steady states. The results obtained from the bifurcation analysis may provide useful information about treatment strategy in further studies.
Abstract. This work presents the classification of functional near-infrared spectroscopy (fNIRS) signals as a tool for prediction of epileptic seizures. The implementation of epilepsy prediction is accomplished by using two classifiers, namely a Support Vector Machine (SVM) for EEG-based prediction and a Convolutional Neural Network (CNN) for fNIRS-based prediction. Performance was measured by computing the Positive Predictive Value (PPV) and the Accuracy of a classifier within a 5-minute window adjacent and previous to the start of the seizure. The objectives of this research are to show that fNIRS-based epileptic seizure prediction yields results that are superior to those based on EEG and to show how deep learning is applied to the solution of this problem.
Abstract. Autism spectrum disorder (ASD) is a heterogeneous disorder, diagnostic tools attempt to identify homogeneous subtypes within ASD. Previous studies found many behavioral/- physiological commodities for ASD, but the clear association between commodities and underlying genetic mechanisms remains unknown. In this paper, we want to leverage ma- chine learning to figure out the relationship between genotype and phenotype in ASD. To this purpose, we propose PhGC pipeline to leverage machine learning approach to to identify behavioral phenotypes of ASD based on their corresponding genomics data. We utilize unsupervised clustering algorithms to extract the core members of each clusters and profile the core member subsets to explore the characteristics using genotype data from the same dataset. Our genome annotation results showed that most of the alleles with different frequency among clusters were represented by the core members.
Abstract. A major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics.
Abstract. A mutation to the amino acid sequence of a protein can cause a biomolecule to be resistant to the intended effects of a drug. Assessing the changes of a drug’s efficacy in response to mutations via mutagenesis wet-lab experiments is prohibitively time consuming for even a single point mutation, let alone for all possible mutations. Existing approaches for inferring mutation-induced drug resistance are available, but all of them reason about mutations of residues at or very near the protein-drug interface. However, there are examples of mutations far away from the region where the ligand binds, but which nonetheless render a protein resistant to the effects of the drug. We present a proof-of-concept computational pipeline that generates in silico the set of all possible single point mutations in a protein-ligand complex. We assess drug resistance using a graph theoretic rigidity analysis approach. Unlike existing methods, we are able to assess the impact of mutations far away from the protein-drug interface. We introduce several visualizations for exploring how amino acid substitutions both near and far away from where the ligand interacts with a protein target have a stabilizing or destabilizing effect on the protein-drug complex. We discuss our analytical approach in the context of experimental data from the literature about clinically known protein-drug interactions.
Abstract. Studying biological systems is difficult because of complexity, variability, and uncertainty. Conceptual models and diagrams are useful in conveying ideas about how a biological phenomenon are thought to be generated. However, sophisticated modeling and simulation methods are needed to discover mechanism-based explanations. Presented herein is a new and unique methodology for this application. Using virtual experiment methods, we recently provided a plausible solution to a problem that had eluded and perplexed pharmacologists and toxicologists for more than 40 years. We describe how virtual and real-world experimentation can be complementary, and propose a way to partially automate the methodology to expedite research.
Abstract. NASA’s GeneLab platform is the first omics database and platform limited to studies related to space flight. This platform provides the scientific community a valuable resource for exploring, evaluating, and discovering hypotheses of space biology. In the platform we offer users multiple tools to explore GeneLab data including: a comprehensive, well-annotated data repository for all space related omics data, omics data analysis tools, space environmental metadata, and differential gene expression visualization tools. Here we focus on the latter, and how these tools can be utilized to explore scientific questions related to space biology. Spaceflight is known to create many health risks in astronauts due to both microgravity and space radiation exposure. Current research is focused on the underlying biological mechanisms behind these health risks and potential countermeasures to mitigate these health risks. One specific health risk is related to increased liver damage during exposure to the space environment (Beheshti et al, Scientific Reports, 2019). Longer exposure to the space environment appears to cause increased lipid accumulation and transcriptomic analysis suggests pathways and genes related to liver disease (i.e. PPARα, Insulin or INS, and Glucagon or GCG) can be activated. We use this scenario to demonstrate how GeneLab’s platform and visualization capabilities can be utilized to quickly reveal this process in GeneLab platform data. Lastly, we discuss some issues and best practices for implementing similar, public- platform visualization tools.
Abstract. The paraclique algorithm provides an effective means for biological data clustering. It satisfies the mathematical quest for density, while fulfilling the pragmatic need for noise abatement on real data. Given a finite, simple, edge-weighted and thresholded graph, the paraclique method first finds a maximum clique, then incorporates additional vertices in a controlled manner, and finally extracts the subgraph thereby defined. When more than one maximum clique is present, however, deciding which to employ is usually left unspecified. In practice, this frequently and quite naturally reduces to using the first maximum clique found. In this paper, maximum clique selection is studied in the context of well-annotated transcriptomic data, with ontological classification used as a proxy for cluster quality. Enrichment p-values are compared using maximum cliques chosen in a variety of ways. The most appealing and intuitive option is almost surely to start with the maximum clique having the highest average edge weight. Although there is of course no guarantee that such a strategy is any better than random choice, results derived from a large collection of experiments indicate that, in general, this approach produces a small but statistically significant improvement in overall cluster quality. Such an improvement, though modest, may be well worth pursuing in light of the time, expense and expertise often required to generate timely, high quality, high throughput biological data.
Abstract. This study is part of our perpetual effort to develop improved RNA secondary structure analysis tools and databases. In this work we present a new Graphical Processing Unit (GPU)-based RNA structural analysis framework that supports fast multiple RNA secondary structure comparison for very large databases. A search-based secondary structure comparison algorithm deployed in RNASSAC website helps bioinformaticians find common RNA substructures from the underlying database. The algorithm performs two levels of binary searches on the database. Its time requirement is affected by the database size. Experiments on the RNASSAC website show that the algorithm takes seconds for a database of 4,666 RNAs. For example, it takes about 4.4 sec for comparing 25 RNAs from this database. In another case, when many non-overlapping common substructures are desired, a heuristic approach requires as long as 85 sec in comparing 40 RNAs from the same database. The comparisons by this sequential algorithm takes at least 50% more time when RNAs are compared from the database of several millions of RNAs. The most recently curated databases already have millions of RNA secondary structures. The improvement in run-time performance of comparison algorithms is necessary. This study present a GPU-based RNA substructure comparison algorithm with which running time for multiple RNA secondary structures remains feasible for large databases. Our new parallel algorithm is 12 times faster than the CPU version (sequential) comparison algorithm of the RNASSAC website. The response time significantly reduces towards development of a realtime RNA comparison web service for bioinformatics community.
Abstract. Gene co-expression networks based on gene expression data are usually used to capture biologically significant patterns, enabling the discovery of biomarkers and interpretation of regulatory relationships. However, the coordination of numerous splicing changes within and across genes can exert a substantial impact on the function of these genes. This is particularly impactful in studies of the properties of the nervous system, which can be masked in the networks that only assess the correlation between gene expression levels. A bioinformatics approach was developed to uncover the role of alternative splicing and associated transcriptional networks using RNA-seq profiles. Data from 40 samples, including control and two treatments associated with sensitivity to stimuli across two central nervous system regions that can present differential splicing, were explored. The gene expression and relative isoform levels were integrated into a transcriptome-wide matrix, and then Graphical Lasso was applied to capture the interactions between genes and isoforms. Next, functional enrichment analysis enabled the discovery of pathways dysregulated at the isoform or gene levels and the interpretation of these interactions within a central nervous region. In addition, a Bayesian biclustering strategy was used to reconstruct treatment-specific networks from gene expression profile, allowing the identification of hub molecules and visualization of highly connected modules of isoforms and genes in specific conditions. Our bioinformatics approach can offer comparable insights into the discovery of biomarkers and therapeutic targets for a wide range of diseases and conditions.
Abstract. Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting co-expression networks that are recurrent in multiple co-expression networks has been shown promising in functional gene annotation and biomarkers discovery. Frequent subgraph mining reports a large number of subnetworks. In this work, we propose to mine approximate dense frequent subgraphs. Our proposed approach reports representative frequent subgraphs that are also dense. Our experiments on real gene coexpression networks show that frequent subgraphs are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense subgraphs.
Abstract. Codon usage bias has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Although measuring the effect of selection in simple organisms such as yeast and E. coli has proven to be effective and accurate, codon-based methods perform less well in plants and humans. In this paper, we extend a prior method that incorporates another evolutionary factor, namely mutation bias and its effect on codon usage. Our results indicate that prediction of gene expression is significantly improved under our framework, and suggests that quantification of mutation bias is essential for fully understanding synonymous codon usage. We also propose an improved method, namely MLE-Φ, with much greater computation efficiency and a wider range of applications. An implementation of this method is provided at Phi.
Abstract. The gene regulatory networks that comprise circadian clocks modulate biological function across a range of scales, from gene expression to performance and adaptive behaviour. These timekeepers function by generating endogenous rhythms that can be entrained to the external 24-hour day-night cycle, enabling organisms to optimally time biochemical processes relative to dawn and dusk. In recent years, computational models based on differential equations, and more recently on Boolean logic, have become useful tools for dissecting and quantifying the complex regulatory relationships underlying the clock’s oscillatory dynamics. Optimising the parameters of these models to experimental data is, however, non-trivial. The search space is continuous and increases exponentially with system size, prohibiting exhaustive search procedures, which are often emulated instead via grid-searching or random explorations of parameter space. Furthermore, to simplify the search procedure, objective functions representing fits to individual experimental datasets are often aggregated, meaning the information contained within them is not fully utilised.
Here, we examine casting this problem as a multi-objective one, and illustrate how the use of an evolutionary optimisation algorithm — the multi-objective evolution strategy (MOES) — can significantly accelerate the parameter search procedure. As a test case, we consider an exemplar circadian clock model based on Boolean delay equations — dynamic models that are discrete in state but continuous in time. The discrete nature of the model enables us to directly compare the performance of our optimiser to grid searches based on enumeration of the parameter space at a fixed resolution. We find that the MOES generates near-optimal parameterisations in computation times which are several orders of magnitude faster than the grid search. As part of this investigation, we also show that there is a distinct trade-off between the performance of the clock circuit in free-running and entrained photic environments. Importantly, runtime results indicate that the use of multi-objective evolutionary optimisation algorithms will make the investigation of larger and more complex models computationally tractable.
Abstract. Promoters drive gene expression and help regulate cellular responses to the environment. In recent research, machine learning models have been developed to predict a bacterial promoter’s transcriptional initiation rate, although these models utilize expert-labeled sequence elements across a defined set of DNA building blocks. The generalizability of these methods is therefore limited by the necessary labeling of the specific components studied. As a result, current models have not been used to predict the transcriptional initiation rates of promoters with generalized nucleotide sequences. If generalizable models existed, they could greatly facilitate the design of synthetic genetic circuits with well-controlled transcription rates in bacteria.
To address these limitations, we used a convolutional neural network (CNN) to predict a promoter’s transcriptional initiation rate directly from its DNA nucleotide sequence. We first evaluated the model on a published promoter component dataset. Trained using only the sequence as input, our model fits held-out test data with R2​ ​= 0.90, comparable to published models that fit expert-labeled sequence elements.
We produced a new promoter strength dataset including non-repetitive promoters with high sequence variation and not limited to combinations of discrete expert-labeled components. Our CNN trained on this more varied dataset fits held-out promoter strength with R2​ ​= 0.61. Previously-published models are intractable on a dataset like this with highly diverse inputs. The CNN outperforms classical approach baselines like LASSO on a bag of words for promoter sequence elements (R2​ ​= 0.42).
We applied recent machine learning approaches to quantify the contribution of individual nucleotides to the CNN's promoter strength prediction. Learning directly from DNA sequence, our model identified the consensus -35 and -10 hexamer regions as well as the discriminator element as keycontributorstoσ7​0​promoterstrength.Italsoreplicatedafindingthataperfectconsensus sequence match does not yield the strongest promoter.
The model's ability to independently learn biologically-relevant information directly from sequence, while performing similarly to or better than classical methods, makes it appealing for further prediction optimization and research into generalizability. This approach may be useful for synthetic promoter design, as well as for sequence feature identification.
Abstract. Patients with epilepsy need to locate the lesion before surgery. Currently, clinical experts diagnose the lesions through visual judgment. In order to reduce the workload of clinical experts, many automatic diagnostic methods have been proposed. Usually, the automatic diagnostic methods often use only one feature as the basis for diagnosis, which has certain limitations. In this paper, we use multiple feature fusion methods for automatic diagnosis. For the cause of epilepsy: abnormal discharge, we use the filter and entropy to capture the energy features of epilepsy discharge. Due to the epilepsy brain waves contain spike and shape waveforms, short time Fourier transform (STFT) is used to analysis the time-frequency features. In feature fusion, we plot the color map of entropy and spectrogram get from STFT together to combine the different types of features. After the feature extraction and fusion steps, each brain signal is converted into an image. Next, we use the visual analysis capabilities of the convolutional neural network (CNN) to classify the plot image. With the visual recognition ability of CNN, in the experiment, we got a classification accuracy of 88.77%. By using automatic diagnostic methods, the workload of clinical experts is greatly reduced in actual clinical practice.
Abstract. With antibiotic resistance on the rise, health organizations are urging for the design of new drug templates. Naturally-occurring antimicrobial peptides (AMPs) promise to serve as such templates, as they show lower likelihood for bacteria to form resistance. This has motivated wet and dry laboratories to seek novel AMPs. The sequence diversity of these peptides, however, renders systematic wet-lab screening studies either infeasible or too narrow in scope. Dry laboratories have focused instead on machine learning approaches. In this paper, we explore various deep neural network architectures aimed at improving antimicrobial peptide recognition. Our enquiry results in several architectures with com- parable or better performance than existing, state-of-the-art discriminative models.
Abstract. In vitro selection enables the identification of functional DNA or RNA sequences (i.e., active sequences) out of entirely or partially random pools. Various computational tools have been developed for the analysis of sequencing data from selection experiments. However, most of these tools rely on structure-function relationship that is usually unknown for de novo selection experiments. This largely restricts the applications of these algorithms. In this paper, an active sequence predictor based on Latent Dirichlet allocation (LDA), ASPECT (Active Sequence PrEdiCTor), is proposed. ASPECT is independent of a priori knowledge on the structures of active sequences. Experimental results showed that ASPECT is effective.
Abstract. The in vitro-in vivo extrapolation (IVIVE) methods used currently to predict the hepatic clearance of new chemical entities are plagued by poorly understood inaccuracies. To begin identifying plausible sources, we challenge two of core hypotheses. Hypothesis-1: the intralobular micro-anatomical organization of hepatocytes (HPCs) can be abstracted away. By accepting that hypothesis, one can assume that intrinsic clearance per HPC is essentially the same in vitro and in vivo, and thus an IVIVE method can employ a simplified liver model, typically the “well-stirred” liver model. Hypothesis-2: when the simplified liver model is the “parallel tube model,” drug concentration decreases exponentially from portal to central vein. When either simplified liver model is used, a core assumption is that intrinsic clearance is directly proportional to the unbound fraction of drug. A barrier to progress has been the fact that it is currently infeasible to challenge the two hypotheses using wet-lab experiments. In this work, we challenge virtual counterparts of the two hypotheses by experimenting on virtual mice in which hepatic disposition and clearance are consequences of concretized model mechanisms that have met several demanding requirements, including the following. The virtual liver’s structure and organization are strongly analogous to those of an actual liver, and the hepatic disposition and clearance of several virtual compounds have achieved quantitative validation targets. We study two virtual compounds. Compound-1 simulates the extreme of low-clearance, highly permeable compounds. Compound-2 simulates a highly permeable compound exhibiting maximum intrinsic clearance. We simulate changes in unbound fraction by changing the probability (pEnter) that a Compound-1 or -2 will enter an adjacent HPC during a simulation cycle. Compound-1 and -2 HPC exposure rates do not decrease from portal to central vein: they increase, and that contradicts both hypotheses. Further, the relationship between exposure rates and pEnter is nonlinear. The insights achieved help explain the frequently reported underprediction of in vivo hepatic clearance values. We suggest that IVIVE methods can be improved by utilizing a liver model that couples a biomimetic representation of intralobular HPC organization with biomimetic representations of intrahepatic disposition dynamics.
Abstract. C. elegans is an ideal organism for modeling aging research due to their simple neural connectome and relatively short life. Current issues faced within aging research includes the decoupling of lifespan and healthspan. We propose a method to efficiently measure the healthspan of a nematode by considering healthspan as a comparable characteristic of the lifespan of a worm, where the lifespan is represented as a temporal sequence. We apply Dynamic Time Warping (DTW) so that the healthspan of any set of worms can be compared based on locomotion feature values’ similarity over time. This technique allows us to compare the effects of various gene knockouts on the healthspan of the worm, such as daf2, in comparison to its wildtype N2. Results show that daf2 worm increases the lifespan of the worm, and using DTW to compare each feature as it changes over time, we can see that the proportion of its life that it stays in a healthy state also increases compared to N2 worms. To validate the results, we measure the time period a worm is healthy and the time period it is frail using a method called frailty threshold analysis. This allows us to determine the day a worm converts from being healthy to frail. We then compare the longevity of healthy state and frail state between worm types based on the duration of healthy state and frail state to see that daf2 in fact extends its healthy state since the proportion it stays in healthy state is greater than that of N2.
Abstract. An improved understanding of in vivo ⇔ in vitro changes is crucial in identifying and mitigating factors contributing to in vitro–in vivo extrapolation (IVIVE) inaccuracies in predicting the hepatic clearance of substances. We argue that a model mechanism-based virtual culture (vCulture) ⇔ virtual mouse (vMouse) (or vRat or vHuman) experiment approach can identify factors contributing to IVIVE disconnects. Doing so depends on having evidence that six Translational Requirements have been achieved. We cite evidence that the first four have been achieved. The fifth Requirement is that differences in measures of vCompound disposition between vCulture and vMouse are attributable solely to the micro-architectural, physiomimetic features, and uncertainties built into the vLiver and vMouse but are absent from the vCulture. The objective of this work is to first improve on a vCulture architecture used previously and then use results of virtual experiments to verify that its use enables the fifth Translational Requirement to be achieved. We employ two different idealized vCompounds, which map to highly permeable real compounds at the extreme ends of the intrinsic clearance spectrum. Virtual intrinsic clearance = Exposure rate per vHPC. At quasi-steady state, results for vCompound-1 are independent of the dosing rate. The average per-vHPC Exposure rates (taken over the whole vLiver in vMouse experiments) are the same (within the variance of the Experiments) as those in vCulture. However, they are location dependent within the vLiver. For vCompound-2, there are dosing rate differences and average per-vHPC Exposure rates within the vLiver are also location dependent. When we account for dosing rate differences, we see again that average per-vHPC Exposure rates averaged over the whole vLiver in vMouse experiments are the same as those in vCulture. Thus, the differences in per vHPC Exposure rate within the vLiver for both vCompounds are attributable solely to the micro-architectural and physiomimetic features built into the vLiver and vMouse but are absent from the vCulture. The results verify that the fifth Translational Requirement has been achieved.
Abstract. Unlike mammals, adult zebrafish hearts retain a remarkable capacity to regenerate after injury. Since regeneration shares many common molecular pathways with embryonic development, we investigated myocardial remodeling genes and pathways by performing a comparative transcriptomic analysis of zebrafish heart regeneration using a set of known human hereditary heart disease genes related to myocardial hypertrophy during development. We cross-matched human hypertrophic cardiomyopathy-associated genes with a time-course microarray dataset of adult zebrafish heart regeneration. Genes in the expression profiles that were highly elevated in the early phases of myocardial repair and remodeling after injury in zebrafish were identified. These genes were further analyzed with web-based bioinformatics tools to construct a regulatory network revealing potential transcription factors and their upstream receptors. In silico functional analysis of these genes showed that they are involved in cardiomyocyte proliferation and differentiation, angiogenesis, and inflammation-related pathways. The regulatory network indicated that β-2- microglobulin-mediated signaling may play an important role in myocardial remodeling after injury. This novel cross-species bioinformatics approach to uncover key modulators of zebrafish heart regeneration through human hereditary disease genomic analysis could greatly facilitate the understanding of the evolutionarily conserved cardiac remodeling process.
Abstract. Microvascular invasion (MVI) diagnosis is of vital importance in the curative treatment of hepatocellular carcinoma patients due to its close relationship with prognostic analysis. Currently, MVI detection often bases on surgical specimen, which is invasive for patients. This study, we extracted texture features of multi-phase MR image to predict the presence of MVI. Feature extraction employed neighboring gray level dependency emphasis (NGLDM) method, which is a common texture feature analysis method. Next, we built a SVM classifier to predict the presence of MVI using extracted features. Especially, multi-phase features were designed to enhance the precision of prediction. Enhanced MR images of pre-contrast phase and portal vein phase were used to extracting features. The method was tested by 5-fold cross- validation on the dataset. The precision of prediction was 91.31%, compared with the baseline method of 70.71%. To make the prediction more interpretable, the relationship between NGLDM texture features and the presence of MVI was discussed in the end.