Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford (2019) Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019) 18:1--18:19 (2019); journal version in Alg Mol Bio (2020).

Transcriptomic structural variants (TSVs) --- structural variants that affect expressed regions --- are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs.

To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangement Problem (MCAP), which seeks k genome rearrangements to maximize the number of reads that are concordant with at least one rearrangement. This directly models the situation of a heterogeneous or diploid sample.

We prove that MCAP is NP-hard and provide a 1/4-approximation algorithm for k=1 and a 3/4-approximation algorithm for the diploid case (k=2) assuming an oracle for k=1. Combining these, we obtain a 3/16-approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We completely characterize the graph structures that require k>1 to satisfy all edges and show such structures are prevalent in cancer samples.

We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance from the state-of-the-art TSV-calling tool, SQUID.

View source