Methods to handle large genomic data sets
Publications
Yutong Qiu,
Carl Kingsford
(2021).
Constructing smaller genome graphs via string compression.
bioRxiv.
Hongyu Zheng,
Carl Kingsford,
Guillaume Marçais
(2020).
Improved design and analysis of practical minimizers.
Proceedings of ISMB 2020 (Bioinformatics).
Prashant Pandey,
Yinjie Gao,
Carl Kingsford
(2019).
VariantStore: A Large-Scale Genomic Variant Search Index.
bioRxiv 888297.
Dan DeBlasio,
Fiyinfoluwa Gbosibo,
Carl Kingsford,
Guillaume Marcais
(2019).
Practical universal k-mer sets for minimizer schemes.
To appear in ACM-BCB 2019.
Guillaume Marçais,
Brad Solomon,
Rob Patro,
Carl Kingsford
(2019).
Sketching and Sublinear Data Structures in Genomics.
Annual Review of Biomedical Data Science 2:93-118.
Guillaume Marçais,
Dan DeBlasio,
Prashant Pandey,
Carl Kingsford
(2019).
Locality sensitive hashing for the edit distance.
Bioinformatics 35(14):i127-i135 (ISMB) 2019.
Guillaume Marçais,
Dan DeBlasio,
Carl Kingsford
(2018).
Asymptotically optimal minimizers schemes.
Bioinformatics (ISMB) 34(13):i13-i22.
Brad Solomon,
Carl Kingsford
(2018).
Improved search of large transcriptomic sequencing databases using split sequence bloom trees.
Journal of Computational Biology 25(7):755-765.
Yaron Orenstein,
David Pellow,
Guillaume Marçais,
Ron Shamir,
Carl Kingsford
(2017).
Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.
PLoS Computational Biology 13(10):e1005777.
Brad Solomon,
Carl Kingsford
(2017).
Improved search of large transcriptomic sequencing databases using split sequence bloom trees.
International Conference on Research in Computational Molecular Biology, pages 257-271.
David Pellow,
Darya Filippova,
Carl Kingsford
(2017).
Improving Bloom filter performance on sequence data using k-mer Bloom filters.
Journal of Computational Biology 24(6):547-557.
Guillaume Marçais,
David Pellow,
Daniel Bork,
Yaron Orenstein,
Ron Shamir,
Carl Kingsford
(2017).
Improving the performance of minimizers and winnowing schemes.
Bioinformatics (ISMB) 33(14):i110-117.
Rob Patro,
Geet Duggal,
Michael I Love,
Rafael A Irizarry,
Carl Kingsford
(2017).
Salmon provides fast and bias-aware quantification of transcript expression.
Nature Methods 14:417-419.
Yaron Orenstein,
David Pellow,
Guillaume Marçais,
Ron Shamir,
Carl Kingsford
(2016).
Compact universal k-mer hitting sets.
International Workshop on Algorithms in Bioinformatics, pages 257-268.
Brad Solomon,
Carl Kingsford
(2016).
Fast search of thousands of short-read sequencing experiments.
Nature Biotechnology 34:300-302.
Rob Patro,
Carl Kingsford
(2015).
Data-dependent bucketing improves reference-free compression of sequencing reads.
Bioinformatics 31(17):2770-2777.
Darya Filippova,
Carl Kingsford
(2015).
Rapid, separable compression enables fast analyses of sequence alignments.
Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, pages 194-201.
Rob Patro,
Stephen M Mount,
Carl Kingsford
(2014).
Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
Nature Biotechnology 32:462-464.
Guillaume Marçais,
Carl Kingsford
(2011).
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.
Bioinformatics 27(6):764-770.