A Scalable Optimization Algorithm for Solving the Beltway and Turnpike Problems with Uncertain Measurements

Abstract

The Beltway and Turnpike problems entail the reconstruction of circular and linear one-dimensional point sets from unordered pairwise distances. These problems arise in computational biology when the measurements provide distances but do not associate those distances with the entities that gave rise to them. Such applications include molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes (since sequencing and mass spec technologies can give lengths or weights, usually without connecting them to end points). Practical algorithms for Turnpike are known when the distance measurements are accurate, but both problems become strongly NP-complete under any level of measurement uncertainty. This is problematic since all known applications experience some degree of uncertainty from uncontrollable factors. Traditional algorithms cope with this complexity by exploring a much larger solution space, leading to exponential blowup in terms of both time and space. To alleviate both issues, we propose a novel alternating optimization algorithm that is able to scale to large, uncertain distance sets with as many as 100,000 points. This algorithm is space and time-efficient, with each step running in $O(m log(m))$ time and requiring only $O( sqrt{m})$ work space for a distance set of size $m$. Evaluations of this approach on synthetic and partial digest data showcase improved accuracy and scalability in the presence of uncertain, duplicated, and missing distances. Our implementation of the algorithm is available at url{https://github.com/Kingsford-Group/turnpikesolvermm}.

Publication
RECOMB 2024