Architecture Overview
Repetition Calculation
PRISME's architecture organizes computation across four hierarchical levels, each with a dedicated script.
Four Levels of Organization
1. Study-Level Organization
The outermost level loops over all studies defined in the outcome structure. For each study, PRISME automatically infers the test type by analyzing the data structure:
- One-sample t-test: When contrasts between conditions are available
- Two-sample t-test: When categorical data exists for exactly two groups
- Correlation analysis: When continuous measures are present
This specification spares manual definition and ensures each analysis is compatible with the data supplied. See Input Data Format for details on the inference logic.
2. Subject-Level Sampling
Within each study, this level loops over the sample sizes specified in Params.list_of_nsubset. Since PRISME generates one output file per study and sample size, this level handles file-based checkpointing for resumption after interruptions. If the file for this study and subject-level has all repetitions completed, it skips to the next loop.
Overall, for each sample size:
- Manages subject identifiers for subsampling
- Constructs appropriate design matrices for each repetition
- Tracks completion status by checking existing output files
- Resumes from checkpoints when calculations are interrupted
3. Batch-Level Management
This level organizes repetitions into configurable batches (set by Params.batch_size). Batching helps with memory management and checkpointing. Trade-off: Larger batches require more RAM but reduce disk I/O overhead. Smaller batches use less RAM but save to disk more frequently.
The batch manager: - Group repetitions into processing units - Balances memory efficiency - Implements the checkpoint system - When a batch is completed, the repetitions are saved to the file - Saves results incrementally after each batch completes
4. Parallel Execution Level
The innermost level distributes computational workload across available CPU cores. PRISME parallelizes at the repetition level because:
- GLM fitting and statistical method inference are the most computationally intensive parts of the code
- Each repetition is independent (no communication overhead)
- With 500 recommended repetitions, parallelization enables near-linear scaling
Each worker processes a complete repetition independently, including: - GLM fitting for original and permuted data - Applying all statistical inference methods - Computing p-values
Set Params.parallel = true and Params.n_workers to the number of available cores to enable parallel execution.
Computational Flow
Study Loop (Study 1, Study 2, ...)
└─ Sample Size Loop (n=40, n=80, ...)
└─ Batch Loop (Batch 1, Batch 2, ...)
└─ Parallel Execution (Rep 1, Rep 2, ... Rep batch_size)
├─ Subsample subjects
├─ Fit GLM + permutations
├─ Apply statistical methods
└─ Return p-values
└─ Save batch results (checkpoint)
Ground Truth Calculation
Unified Pipeline
Although the algorithm describes two computational paths (repetitions and ground truth), the architecture implements both within a unified framework. The ground truth calculation is a special case of the repetition workflow where: - Only one repetition is performed - The entire dataset (N subjects) is used - A dummy statistical inference method maintains code consistency - Only the GLM fit results are extracted
This unification simplifies maintenance. One code changes automatically affect both workflows.