Primary competition visual

MPEG-G: Decoding the Dialogue

$5 000 USD
Completed (5 months ago)
Visualisation
Insights
Prediction
496 joined
26 active
Starti
Jun 27, 25
Closei
Nov 02, 25
Reveali
Dec 02, 25
User avatar
Memorial Sloan Kettering Cancer Center
🚀 Why decompress when you can dive directly into the compression?
18 Sep 2025, 14:13 · 0

🧠 Did you know that you don’t need to fully decompress MPEG-G files to build powerful AI models? ( I posted on the challenge 1, and then realized that this could be more relevant for challenge 2, task 5)

The Genie codec gives you structured access to metadata, read group stats, alignment summaries, and more—all without turning the .gnm file back into FASTQ.

Here’s what you can extract directly:

  • ✅ GC content, read length, and quality score histograms
  • ✅ Alignment stats, k-mer entropy, and coverage profiles
  • ✅ Platform/tech metadata and sequencing summaries

💡 These codec-level features can be used directly as model inputs—saving compute, time, and memory!

Even better? You can build a custom DataLoader in Python using genie print-index or decode --print-metadata, and skip full reconstruction entirely. That’s compression-aware AI in action. 🧬💻

Want help building one?

Discussion 0 answers