Large genome assembly (~30 Gb) fails due to memory requirements; prospects for distributed-memory support?

[Heamanthus_pumilo_hifiasm.log](https://github.com/user-attachments/files/28551484/Heamanthus_pumilo_hifiasm.log)

Hi there,

We are attempting to assemble the genome of _Haemanthus pumilo_ (plant genome) using Hifiasm. The estimated genome size is approximately 30 Gb, with an estimated HiFi sequencing coverage of only ~10×, so reducing the input dataset is not a practical option.

The assembly progresses substantially but consistently runs out of memory before reaching the error-correction stage. This is a critical checkpoint because reaching it would allow the assembly to be resumed after wall-time limitations on our HPC system. Unfortunately, the job exhausts all available RAM (1 TB) before this stage is completed.

We understand that Hifiasm's memory requirements are expected to increase with genome size, particularly for very large and repetitive plant genomes. However, we would appreciate any guidance on the following:

1. Are there any recommended parameters, strategies, or workflow modifications that could reduce memory usage for genomes of this scale without sacrificing already limited coverage?
2. Is support for genomes in the ~30 Gb range considered within the intended scope of Hifiasm, or are we approaching practical memory limits of the current implementation?
3. More generally, are there any plans or prospects for future versions of Hifiasm to support distributed-memory execution across multiple compute nodes?

Our HPC center has noted that sourcing sufficiently large single-node memory systems is becoming increasingly challenging and expensive. We are aware that distributed-memory approaches (e.g., PGAS-based frameworks and distributed hash-table implementations) exist for applications with very large in-memory data structures, and we were curious whether such approaches have ever been considered for Hifiasm or if there are architectural constraints that would make this impractical.

Any advice or insight would be greatly appreciated.

Thank you in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large genome assembly (~30 Gb) fails due to memory requirements; prospects for distributed-memory support? #924

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Large genome assembly (~30 Gb) fails due to memory requirements; prospects for distributed-memory support? #924

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions