R.I.P.
๐ป
Ghosted
An experimental sorting method for improving metagenomic data encoding
January 03, 2024 ยท Entered Twilight ยท ๐ Data Compression Conference
Repo contents: LICENSE, MizaR.sh, Plot_channels.sh, Plot_coverage.sh, Plot_sequences.sh, README.md, RunAll.sh, Simulate.sh, VDB_MT_ALL_REF.fa.lzma
Authors
Diogo Pratas, Armando J. Pinho
arXiv ID
2401.01786
Category
cs.IT: Information Theory
Cross-listed
q-bio.GN
Citations
2
Venue
Data Compression Conference
Repository
https://github.com/cobilab/mizar
โญ 1
Last Checked
3 months ago
Abstract
Minimizing data storage poses a significant challenge in large-scale metagenomic projects. In this paper, we present a new method for improving the encoding of FASTQ files generated by metagenomic sequencing. This method incorporates metagenomic classification followed by a recursive filter for clustering reads by DNA sequence similarity to improve the overall reference-free compression. In the results, we show an overall improvement in the compression of several datasets. As hypothesized, we show a progressive compression gain for higher coverage depth and number of identified species. Additionally, we provide an implementation that is freely available at https://github.com/cobilab/mizar and can be customized to work with other FASTQ compression tools.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Information Theory
R.I.P.
๐ป
Ghosted
A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems
R.I.P.
๐ป
Ghosted
Towards Smart and Reconfigurable Environment: Intelligent Reflecting Surface Aided Wireless Network
๐
๐
The Cartographer
Wireless Communications with Unmanned Aerial Vehicles: Opportunities and Challenges
R.I.P.
๐ป
Ghosted
Reconfigurable Intelligent Surfaces for Energy Efficiency in Wireless Communication
๐
๐
The Cartographer