DeepMinimizer: A Differentiable Framework for Optimizing Sequence-Specific Minimizer Schemes

Abstract

Minimizers are k-mer sampling schemes designed to generate sketches for large sequences that preserve sufficiently long matches between sequences. Despite its widespread application, learning an effective minimizer scheme with optimal sketch size is still an open question. Most work in this direction focuses on designing schemes that work well on expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, require greedy approximations to solve an intractable discrete optimization problem on the permutation space of $k$-mer orderings. To address this challenge, we propose: (a) a reformulation of the combinatorial solution space using a deep neural network reparameterization; and (b) a fully differentiable approximation of the discrete objective. We demonstrate that our framework, extsc{DeepMinimizer}, discovers minimizer schemes that significantly outperform state-of-the-art constructions on genomic sequences.

Publication
In Proceedings of RECOMB 2022.