Changelog

Version 0.3.3b2

Added

  • Added remote cache support for ORCA calculation results via API

  • Added RemoteCacheClient class for interacting with remote cache service API

  • Added remote cache integration in CacheManager with hybrid local/remote caching

  • Added cache_server_url, cache_api_token, cache_timeout, and cache_only parameters to Orca class

  • Added CLI parameters --cache_server_url, --cache_api_token, --cache_timeout, and --cache_only for remote cache configuration

  • Added cache_only parameter to ORCABatchProcessing class to enable cache-only mode (no ORCA calculations, only use cached results)

  • Added requests>=2.28.0 dependency for HTTP API communication

  • Added comprehensive error handling for remote cache operations: * RemoteCacheError for general API errors * RemoteCachePermissionError for access permission errors (can_read/can_upload) * Proper handling of HTTP errors (401, 403, 404, 500, timeouts, network errors) * Graceful handling of rate limiting and server errors

  • Added integration tests for remote cache functionality: * Client initialization and connectivity tests * Permission checking tests * Upload and retrieval tests * ORCA calculation integration tests * Error handling and fallback tests * Batch processing with remote cache tests

Changed

  • Enhanced CacheManager to support hybrid caching (local + remote): * Local cache is checked first, then remote cache if available * Remote cache entries are automatically downloaded and stored locally when found * Local cache entries are automatically uploaded to remote server after storage * input_parameters are now extracted from remote cache responses and stored in local cache index

  • Improved cache system to gracefully handle remote cache failures without interrupting calculations

  • Updated RemoteCacheClient to support both X-API-Key (default) and Authorization: Bearer authentication methods

  • Enhanced error parsing to extract detailed error messages from API responses

  • Improved cache retrieval flow: first checks cache existence, then downloads file if available

  • Improved batch processing performance by implementing pre-cache checking for all molecules: * All molecules are checked for cache (both local and remote) before starting calculations * Cached molecules are processed immediately and excluded from further calculations * Statistics are displayed showing the number of molecules found in cache and excluded from calculations * This significantly speeds up batch processing when many molecules are already cached

  • Enhanced cache integration in batch processing: * Remote cache is checked automatically if API token is provided * Cached results from both local and remote cache are processed immediately * Better progress reporting with cache statistics

  • Added cache-only mode support: * When cache_only=True, only cached results are used (no ORCA calculations are performed) * Molecules not found in cache return None for descriptors instead of triggering calculations * Useful for quickly retrieving results from cache without running expensive calculations

Technical Details

  • Remote cache works transparently - cached results from server work the same as local cache

  • If remote cache is unavailable or fails, the system falls back to local-only caching

  • API token authentication supports both X-API-Key header (recommended for programmatic access) and Authorization: Bearer token

  • Remote cache supports both read and upload operations with permission checking

  • Cache timeout is configurable (default: 30 seconds)

  • All remote cache errors are logged as warnings, allowing calculations to continue with local cache

  • Batch processing fully supports remote cache - calculations are cached and retrieved from remote server when available

  • Cache operations use API v1 endpoints: /api/v1/cache/check, /api/v1/cache/upload, /api/v1/cache/{cache_id}/files/{filename}

  • Pre-cache checking in batch processing happens before any calculations start, allowing immediate processing of cached molecules

  • Cached molecules are excluded from calculation queues, reducing unnecessary work

  • Cache statistics help users understand how many calculations were skipped due to caching

  • Works seamlessly with both sequential and multiprocessing modes

Version 0.3.4

Added

  • Added clear CLI command to remove all ORCA files in the working directory (useful for cleaning up files that weren’t removed due to errors)

  • Added purge_cache CLI command to remove ORCA cache

Changed

  • Improved cache system to preserve original file extensions (.out, .log, .smd.out) when storing files

  • Enhanced NBO stabilization energy parsing with better error messages indicating NBOEXE environment variable requirement

  • Updated test for AM1 Mayer bond indices to account for different values compared to DFT methods

Removed

  • Removed ESP extrema descriptor (get_esp_extrema) - not implemented due to limitations in ORCA 6.0.1 for generating ESP cube files directly (would require orca_plot utility integration)

Fixed

  • Fixed cache system issue where files were saved with .out extension regardless of original extension, causing file not found errors

  • Fixed NBO stabilization energy parsing to properly detect when NBOEXE environment variable is not set

  • Fixed NMR chemical shifts parsing to work correctly with cached files

  • Fixed test for AM1 Mayer bond indices to accept different value ranges compared to DFT

Technical Details

  • Cache now preserves original file extensions to ensure correct file retrieval

  • NBO analysis requires NBOEXE environment variable to be set to point to NBO executable (nbo6.exe or nbo5.exe)

  • ESP extrema calculation would require integration with orca_plot utility, which is not currently implemented

  • CLI commands clear and purge_cache help maintain clean working directories and cache management

Version 0.3.3

Changed

  • Major code refactoring: split large orca.py file (1620 lines) into modular structure: * Created base.py with OrcaBase class containing common utility methods * Created calculation.py with CalculationMixin for calculation execution methods * Created decorators.py with handle_x_molecule decorator * Split descriptors into separate modules by category:

    • descriptors/electronic.py - Electronic property descriptors

    • descriptors/energy.py - Energy-related descriptors

    • descriptors/structural.py - Structural property descriptors

    • descriptors/topological.py - Topological descriptors

    • descriptors/misc.py - Miscellaneous descriptors

    • Main Orca class now uses multiple inheritance from mixins

    • Improved code organization and maintainability

    • Removed redundant comments throughout the codebase

Technical Details

  • The new modular structure makes it easier to extend functionality and maintain code

  • Descriptors are organized by category for better code navigation

  • All functionality remains backward compatible

Version 0.3.0

Added

  • Added new ORCABatchProcessing class for efficient batch processing of molecular descriptors with pandas compatibility

  • Added support for semi-empirical methods (AM1, PM3, PM6, PM7, RM1, MNDO, MNDOD, OM1, OM2, OM3)

  • Added pre_optimize parameter (default: True) for pre-optimizing molecular geometry using MMFF94 force field before ORCA calculations

  • Added multiprocessing support for parallel batch processing via parallel_mode="multiprocessing" parameter

  • Added automatic cleanup of all ORCA temporary files (input, output, and all temporary files) after calculations since results are cached

  • Added improved error parsing with brief summaries in logging.INFO and detailed information in logging.DEBUG

  • Added _pre_optimize_geometry() method for MMFF94 geometry optimization

  • Added _is_semi_empirical() method to detect semi-empirical methods

Changed

  • Refactored batch processing functionality from Orca.calculate_descriptors() into dedicated ORCABatchProcessing class

  • Orca.calculate_descriptors() now uses ORCABatchProcessing internally for backward compatibility

  • calculate_descriptors() now preserves original DataFrame columns (including ‘smiles’) instead of removing and re-adding them

  • Improved molecule hash calculation to include pre_optimize parameter for proper caching

  • Updated molecule hash calculation to exclude basis_set and dispersion_correction for semi-empirical methods

  • Enhanced input file generation to support semi-empirical methods (no basis set or dispersion correction needed)

  • Improved file cleanup to remove all ORCA files (including input and output files) since results are cached

Fixed

  • Fixed DataFrame handling in batch processing to preserve all original columns

  • Fixed error handling to provide concise error messages in INFO level and detailed information in DEBUG level

Technical Details

  • ORCABatchProcessing supports three parallelization modes: “sequential”, “multiprocessing”, and “mpirun”

  • Pre-optimization uses RDKit’s MMFF94 force field for fast geometry optimization before quantum chemical calculations

  • All ORCA files are automatically cleaned up after successful calculations, with results stored in cache

  • Semi-empirical methods are automatically detected and handled differently from DFT methods

  • Batch processing now includes time estimation based on benchmark machine performance

Version 0.2.2

Added

  • Added numpy>=1.20.0 to project dependencies (numpy was used but not declared)

  • Added dynamic time estimation updates in batch processing - time estimates are now refined based on actual execution times of previous molecules

  • Added _get_available_descriptors() method to dynamically discover available descriptor methods

Changed

  • Updated dipole moment parser to prioritize gas-phase values when available (for calculations without solvation)

  • Improved time estimation algorithm: * Changed scaling exponent from O(N^3.5) to O(N^2.5) for more realistic estimates * Uses total_time from benchmark instead of scf_time as base unit * More realistic optimization step estimation (15-35 steps instead of 10-50) * Removed artificial 24-hour time cap

  • Refactored calculate_descriptors() method: * Removed redundant code duplication (replaced large if-elif chain with getattr-based method calls) * Removed redundant all_descriptors list - descriptors are now discovered dynamically * Removed unnecessary comments * Improved code maintainability and readability

Fixed

  • Fixed dipole moment parser to correctly extract gas-phase values from ORCA output when available

  • Fixed time estimation showing unrealistic values (e.g., 47 hours for 2 molecules) - now provides accurate estimates based on actual benchmark data

Technical Details

  • Time estimator now uses exponential moving average for better prediction accuracy

  • Descriptor methods are called dynamically using getattr(self, desc_name)

  • Automatic descriptor discovery eliminates need to maintain manual descriptor lists