Viral Genomics
The Gap Between General Genomics and Viral Reality
Nucleotide foundation models are rapidly becoming a central tool for biological sequence modeling. They can read long DNA or RNA sequences, transfer across species, and support both discriminative and generative tasks. But viral genomics poses a harder question: can these models truly understand viral sequences under realistic evolutionary, taxonomic, and temporal shifts?
Viruses are not just another subset of biological sequences. They evolve quickly, span diverse genome types, and often appear in long-tailed, rapidly changing distributions. A model that performs well on general genomic benchmarks may still fail when asked to distinguish closely related viral taxa, predict host categories, or remain robust to newly recorded viral sequences.
We built ViroBench to make this gap measurable.
ViroBench asks a practical question: do nucleotide foundation models retain reliable viral understanding under taxonomic, evolutionary, and temporal shift, rather than only on easier in-distribution settings?
Why Viral Benchmarks Need to Be Different
ViroBench is a unified benchmark for evaluating nucleotide foundation models on viral genomics tasks. Instead of testing models on a single simplified prediction problem, it organizes evaluation around two complementary axes: biological understanding and generation diagnostics. The first asks whether a model can recognize biologically meaningful viral signals; the second asks whether a model can produce or score viral sequences in a way that reflects sequence-level fidelity, coding constraints, and long-context behavior.
At its core, ViroBench contains 58,314 curated viral samples from NCBI, enriched with taxonomy, host annotations, nucleic-acid types, and data-source information. The benchmark covers four task families: Taxonomy Classification, Host Prediction, Genome Modeling, and CDS Completion. Together, these tasks provide a structured view of how nucleotide foundation models behave across viral sequence understanding and generation-oriented evaluation.
Many existing genomic benchmarks focus on regulatory elements, human genomics, or general DNA classification. These tasks are important, but they do not fully capture the distinctive challenges of viral genomics. Viral sequences introduce several forms of difficulty at once: DNA and RNA viruses follow different sequence distributions; closely related genera may be hard to separate; and new viral records can shift substantially over time.
ViroBench therefore avoids relying on a single random split. For classification, it includes genus-disjoint and temporal split settings, making it harder for models to benefit from near-duplicate or closely related sequences across train and test sets. This design better reflects practical use cases, where models may need to generalize to newly observed or evolutionarily distant viruses.
What ViroBench Evaluates
ViroBench evaluates models through four task groups.
Rather than treating generation as a single score, ViroBench reports multiple diagnostic metrics, including edit distance, alignment identity, exact match accuracy, and k-mer distribution distances.
A More Diagnostic Leaderboard
The ViroBench leaderboard is designed to be more than a ranking table. It lets users compare models by task family, scenario, and metric, making it easier to identify where a model succeeds and where it fails.
A model may perform well on taxonomy classification but struggle with host prediction. Another may show strong short-sequence modeling but degrade sharply on longer viral genomes. Some models may produce sequences with plausible local statistics while failing to preserve coding-level constraints. These differences are difficult to see from a single averaged score, but they are central to understanding model behavior in viral genomics.
Looking Forward
ViroBench is intended as a living benchmark for the next generation of nucleotide foundation models. As models become longer-context, more generative, and more widely used in biological research, viral genomics offers a demanding testbed for both capability and reliability.
The goal is not only to ask which model ranks first. The more important question is: what kind of viral knowledge does each model actually learn, and where do its limits appear?
By providing standardized data, controlled task settings, and interpretable metrics, ViroBench aims to support more reproducible evaluation and more biologically grounded model development for viral genomics.