Skip to content

Commit 18a1377

Browse files
Tables paper.md
1 parent b445121 commit 18a1377

File tree

1 file changed

+27
-19
lines changed

1 file changed

+27
-19
lines changed

paper.md

+27-19
Original file line numberDiff line numberDiff line change
@@ -47,30 +47,38 @@ NEAT can integrate seamlessly with existing bioinformatics workflows, providing
4747

4848
## Algorithmic Improvements and Methodological Changes
4949

50-
## Table 1. Enhancements in Algorithmic Performance
51-
52-
| Feature | Prior Implementation (v2.0) | Updated Implementation (v4.X) | Rationale for Change | Demonstrated Improvement |
53-
|---------|------------------------------|--------------------------------|----------------------|-------------------------|
54-
| **Read Quality Modeling** | Markov-based model | Binning method with an option to also implement a revised Markov-based model | Did not achieve a tapering effect on a simulated read's edges | The tapering effect was achieved with the revised Markov model |
55-
| **Guanine-Cytosine (GC) Bias Computation** | Used a custom script for GC bias calculation | Feature deprecated | The script has decreased relevance due to advances in sequencing technology | Reduced runtime and absence of reported bugs |
56-
| **Ploidy Simulation** | Limited to diploid organisms in practice | Supports unbounded ploidy levels | This is essential for the accurate modeling of tumors or polyploid organisms, such as plants | Inputs of ploidy greater than two and fractional ploidies will correctly simulate reads |
57-
| **Variant Insertion** | Issues with inserted variants (loss of genotype data, prevented certain valid variants from insertion) | Preserves genotype data in the final variant call format (VCF) file | Allows greater flexibility and user control over variant inclusion | Improved accuracy of inserted variants (long variant support still in progress) |
58-
| **Read Generation** | The sliding-window approach to generate reads resulted in artificial gaps in sequencing reads (~50 base pairs) | A new form of coordinate-based read selection eliminates these gaps | Aimed to produce datasets more representative of real sequencing patterns | Elimination of artificial gaps |
59-
| **Variant Type Handling** | The code structure limited the introduction of new variant types | A modular design supports generic variant handling and the separation of insertions and deletions | Paves the way for structural and copy number variant support | More flexible insertion handling and future extensibility |
60-
| **Binary Alignment Map (BAM) File Generation** | File generation was tightly integrated with all NEAT processes | BAM creation was isolated from core functions | Improves runtime and modularity | BAM generation can now be toggled independently |
50+
# Tables
51+
52+
## Algorithmic Improvements and Methodological Changes
53+
54+
### Table 1. Enhancements in Algorithmic Performance
55+
56+
| # | Feature Name | Prior Implementation (v2.0) | Updated Implementation (v4.X) |
57+
|---|-------------|------------------------------|--------------------------------|
58+
| 1 | **Binary Alignment Map (BAM) File Generation** | File generation was tightly integrated with all NEAT processes | BAM creation was isolated from core functions |
59+
| 2 | **Guanine-Cytosine (GC) Bias Computation** | Used a custom script for GC bias calculation | Feature deprecated |
60+
| 3 | **Ploidy Simulation** | Limited to diploid organisms in practice | Supports unbounded ploidy levels |
61+
| 4 | **Read Generation** | The sliding-window approach to generate reads resulted in artificial gaps in sequencing reads (~50 base pairs) | A new form of coordinate-based read selection eliminates these gaps |
62+
| 5 | **Read Quality Modeling** | Markov-based model | Binning method with an option to also implement a revised Markov-based model |
63+
| 6 | **Variant Insertion** | Issues with inserted variants (loss of genotype data, prevented certain valid variants from insertion) | Preserves genotype data in the final variant call format (VCF) file |
64+
| 7 | **Variant Type Handling** | The code structure limited the introduction of new variant types | A modular design supports generic variant handling and the separation of insertions and deletions |
65+
66+
The prior implementation of **Binary Alignment Map (BAM) File Generation** tightly integrated BAM creation with all NEAT functions, leading to inefficiencies. The new update isolates BAM creation, allowing it to be toggled independently, improving runtime and modularity. **Guanine-Cytosine (GC) Bias Computation** was removed due to redundancy, as advancements in sequencing technology rendered the custom script unnecessary. Its removal reduced runtime while eliminating associated bugs. **Ploidy Simulation** has been extended to allow accurate simulation of tumor genomes and polyploid organisms (e.g., plants), with inputs of ploidy greater than two and fractional ploidies now correctly simulating reads. **Read Generation** previously introduced artificial read gaps (~50 base pairs) due to a sliding-window approach. The updated coordinate-based selection eliminates these gaps, yielding a dataset that more accurately reflects real sequencing patterns. **Read Quality Modeling** initially did not achieve a tapering effect on a simulated read's edges. By incorporating a revised Markov model alongside the binning method, the tapering effect was successfully implemented. **Variant Insertion** suffered from loss of genotype data and an arbitrary restriction on certain valid variants. The updated version preserves genotype data in the final VCF file, improving accuracy and giving users greater control over insertions. **Variant Type Handling** has been modularized to support structural and copy number variants, increasing flexibility and ensuring future extensibility for handling more complex variants.
6167

6268
\newpage
6369

64-
## Table 2. Performance Enhancements and User-Centric Modifications
70+
## Performance Enhancements and User-Centric Modifications
71+
72+
### Table 2. Performance and User Experience Improvements
6573

66-
| Feature | Prior Implementation (v2.0) | Updated Implementation (v4.X) | Rationale for Change | Demonstrated Improvement |
67-
|---------|------------------------------|--------------------------------|----------------------|-------------------------|
68-
| **Modular Codebase & Installation** | Not installable as a package | Fully modular and pip-installable via Poetry | Facilitates ease of development, portability, and deployment | Reduced dependencies, improved maintainability |
69-
| **Code Refactoring & Unit Testing** | Monolithic, unstructured codebase | Rewritten with testable, discrete functions | Enhances maintainability and collaborative development | Improved code readability and integrity |
70-
| **User Experience: Configuration Management** | Required explicit command-line flags | Introduced structured configuration files | Improves usability, debugging, and reproducibility | Simplified interface, increased accessibility |
71-
| **Automated Testing Framework** | No formal testing framework | Implemented continuous integration with GitHub-based automated tests | Improves development efficiency and debugging capabilities | Enhanced detection of random bugs and user issues (e.g., file handling) |
74+
| # | Feature Name | Prior Implementation (v2.0) | Updated Implementation (v4.X) |
75+
|---|-------------|------------------------------|--------------------------------|
76+
| 1 | **Automated Testing Framework** | No formal testing framework | Implemented continuous integration with GitHub-based automated tests |
77+
| 2 | **Code Refactoring & Unit Testing** | Monolithic, unstructured codebase | Rewritten with testable, discrete functions |
78+
| 3 | **Modular Codebase & Installation** | Not installable as a package | Fully modular and pip-installable via Poetry |
79+
| 4 | **User Experience: Configuration Management** | Required explicit command-line flags | Introduced structured configuration files |
7280

73-
Parallelization and memory profiling tools will be updated shortly.
81+
**Automated Testing Framework** was implemented to address the lack of a formal testing structure. The new continuous integration (CI) pipeline detects bugs early, streamlining development and enhancing error detection (e.g., handling of BED files and other inputs). **Code Refactoring & Unit Testing** improved debugging and maintenance by transitioning from a monolithic structure to a modular approach with testable, discrete functions, enhancing code integrity and collaboration. **Modular Codebase & Installation** was introduced to address the previous lack of package installation support, making NEAT 4.X modular and pip-installable via Poetry, which enhances portability and development ease. Lastly, **User Experience: Configuration Management** improved usability, debugging, and reproducibility by replacing cumbersome command-line flags with structured configuration files. Parallelization and memory profiling tools will be updated shortly.
7482

7583
\newpage
7684

0 commit comments

Comments
 (0)