Benchmark and Time Estimation

Overview

The library includes a time estimation system that helps predict how long ORCA calculations will take. This is essential for planning large-scale QSAR descriptor calculations.

How It Works

The time estimation system uses a benchmark calculation to calibrate performance on your machine:

  1. Benchmark Calculation: A single-point calculation is run on benzene (C₁H₆) to measure:

    • Number of basis functions

    • Time per SCF cycle

    • Total calculation time

  2. Scaling Formula: For new molecules, the system estimates time using:

    • Molecular size scaling (O(N².⁵) for DFT calculations)

    • Method type (SP, Opt, Freq)

    • Number of processors

  3. Parameter Scaling: The system automatically adjusts for different:

    • Number of processors (accounts for parallel efficiency)

    • Basis sets (scales with size)

    • Functionals (accounts for computational cost)

Why It’s Needed

  • Planning: Estimate total time for large datasets

  • Resource Management: Allocate computational resources efficiently

  • Progress Tracking: Monitor calculation progress

  • Optimization: Choose optimal parameters for your hardware

Running a Benchmark

Using CLI

orca_descriptors run_benchmark --working_dir ./calculations

The benchmark uses benzene as a standard test molecule and saves results to .orca_benchmark.json.

Using Python

from orca_descriptors import Orca

orca = Orca(working_dir="./calculations")
benchmark_data = orca.run_benchmark()

print(f"SCF cycle time: {benchmark_data['scf_time']:.2f} seconds")
print(f"Number of basis functions: {benchmark_data['n_basis']}")

Estimating Calculation Time

Using CLI

orca_descriptors approximate_time --molecule CCO --method_type Opt

This estimates time without running the actual calculation.

Using Python

from orca_descriptors import Orca
from rdkit.Chem import MolFromSmiles, AddHs

orca = Orca(working_dir="./calculations")
mol = AddHs(MolFromSmiles("CCO"))

estimated_time = orca.estimate_calculation_time(mol)
print(f"Estimated time: {estimated_time:.2f} seconds")

Automatic Parameter Scaling

The system automatically scales benchmark data for different parameters. You don’t need to re-run the benchmark if you change:

  • Number of processors: Automatically accounts for parallel efficiency

  • Basis set: Scales based on basis set size (O(N³.⁵))

  • Functional: Adjusts for relative computational cost

Example: If your benchmark was run with 1 processor and def2-SVP, you can estimate time for 4 processors and def2-TZVP without re-running the benchmark.

Benchmark File Location

The benchmark data is saved to:

<working_dir>/.orca_benchmark.json

This file contains:

  • Functional and basis set used

  • Number of processors

  • Number of basis functions

  • SCF cycle time

  • Total calculation time

You can share this file between different working directories if using the same hardware and ORCA version.