Changelog¶

Version 0.3.3b2¶

Added¶

Added remote cache support for ORCA calculation results via API
Added RemoteCacheClient class for interacting with remote cache service API
Added remote cache integration in CacheManager with hybrid local/remote caching
Added cache_server_url, cache_api_token, cache_timeout, and cache_only parameters to Orca class
Added CLI parameters --cache_server_url, --cache_api_token, --cache_timeout, and --cache_only for remote cache configuration
Added cache_only parameter to ORCABatchProcessing class to enable cache-only mode (no ORCA calculations, only use cached results)
Added requests>=2.28.0 dependency for HTTP API communication
Added comprehensive error handling for remote cache operations: * RemoteCacheError for general API errors * RemoteCachePermissionError for access permission errors (can_read/can_upload) * Proper handling of HTTP errors (401, 403, 404, 500, timeouts, network errors) * Graceful handling of rate limiting and server errors
Added integration tests for remote cache functionality: * Client initialization and connectivity tests * Permission checking tests * Upload and retrieval tests * ORCA calculation integration tests * Error handling and fallback tests * Batch processing with remote cache tests

Changed¶

Enhanced CacheManager to support hybrid caching (local + remote): * Local cache is checked first, then remote cache if available * Remote cache entries are automatically downloaded and stored locally when found * Local cache entries are automatically uploaded to remote server after storage * input_parameters are now extracted from remote cache responses and stored in local cache index
Improved cache system to gracefully handle remote cache failures without interrupting calculations
Updated RemoteCacheClient to support both X-API-Key (default) and Authorization: Bearer authentication methods
Enhanced error parsing to extract detailed error messages from API responses
Improved cache retrieval flow: first checks cache existence, then downloads file if available
Improved batch processing performance by implementing pre-cache checking for all molecules: * All molecules are checked for cache (both local and remote) before starting calculations * Cached molecules are processed immediately and excluded from further calculations * Statistics are displayed showing the number of molecules found in cache and excluded from calculations * This significantly speeds up batch processing when many molecules are already cached
Enhanced cache integration in batch processing: * Remote cache is checked automatically if API token is provided * Cached results from both local and remote cache are processed immediately * Better progress reporting with cache statistics
Added cache-only mode support: * When cache_only=True, only cached results are used (no ORCA calculations are performed) * Molecules not found in cache return None for descriptors instead of triggering calculations * Useful for quickly retrieving results from cache without running expensive calculations

Technical Details¶

Remote cache works transparently - cached results from server work the same as local cache
If remote cache is unavailable or fails, the system falls back to local-only caching
API token authentication supports both X-API-Key header (recommended for programmatic access) and Authorization: Bearer token
Remote cache supports both read and upload operations with permission checking
Cache timeout is configurable (default: 30 seconds)
All remote cache errors are logged as warnings, allowing calculations to continue with local cache
Batch processing fully supports remote cache - calculations are cached and retrieved from remote server when available
Cache operations use API v1 endpoints: /api/v1/cache/check, /api/v1/cache/upload, /api/v1/cache/{cache_id}/files/{filename}
Pre-cache checking in batch processing happens before any calculations start, allowing immediate processing of cached molecules
Cached molecules are excluded from calculation queues, reducing unnecessary work
Cache statistics help users understand how many calculations were skipped due to caching
Works seamlessly with both sequential and multiprocessing modes

Version 0.3.4¶

Added¶

Added clear CLI command to remove all ORCA files in the working directory (useful for cleaning up files that weren’t removed due to errors)
Added purge_cache CLI command to remove ORCA cache

Changed¶

Improved cache system to preserve original file extensions (.out, .log, .smd.out) when storing files
Enhanced NBO stabilization energy parsing with better error messages indicating NBOEXE environment variable requirement
Updated test for AM1 Mayer bond indices to account for different values compared to DFT methods

Removed¶

Removed ESP extrema descriptor (get_esp_extrema) - not implemented due to limitations in ORCA 6.0.1 for generating ESP cube files directly (would require orca_plot utility integration)

Fixed¶

Fixed cache system issue where files were saved with .out extension regardless of original extension, causing file not found errors
Fixed NBO stabilization energy parsing to properly detect when NBOEXE environment variable is not set
Fixed NMR chemical shifts parsing to work correctly with cached files
Fixed test for AM1 Mayer bond indices to accept different value ranges compared to DFT

Technical Details¶

Cache now preserves original file extensions to ensure correct file retrieval
NBO analysis requires NBOEXE environment variable to be set to point to NBO executable (nbo6.exe or nbo5.exe)
ESP extrema calculation would require integration with orca_plot utility, which is not currently implemented
CLI commands clear and purge_cache help maintain clean working directories and cache management

Version 0.3.3¶

Changed¶

Major code refactoring: split large orca.py file (1620 lines) into modular structure: * Created base.py with OrcaBase class containing common utility methods * Created calculation.py with CalculationMixin for calculation execution methods * Created decorators.py with handle_x_molecule decorator * Split descriptors into separate modules by category:
- descriptors/electronic.py - Electronic property descriptors
- descriptors/energy.py - Energy-related descriptors
- descriptors/structural.py - Structural property descriptors
- descriptors/topological.py - Topological descriptors
- descriptors/misc.py - Miscellaneous descriptors
- Main Orca class now uses multiple inheritance from mixins
- Improved code organization and maintainability
- Removed redundant comments throughout the codebase

Technical Details¶

The new modular structure makes it easier to extend functionality and maintain code
Descriptors are organized by category for better code navigation
All functionality remains backward compatible

Version 0.3.0¶

Added¶

Added new ORCABatchProcessing class for efficient batch processing of molecular descriptors with pandas compatibility
Added support for semi-empirical methods (AM1, PM3, PM6, PM7, RM1, MNDO, MNDOD, OM1, OM2, OM3)
Added pre_optimize parameter (default: True) for pre-optimizing molecular geometry using MMFF94 force field before ORCA calculations
Added multiprocessing support for parallel batch processing via parallel_mode="multiprocessing" parameter
Added automatic cleanup of all ORCA temporary files (input, output, and all temporary files) after calculations since results are cached
Added improved error parsing with brief summaries in logging.INFO and detailed information in logging.DEBUG
Added _pre_optimize_geometry() method for MMFF94 geometry optimization
Added _is_semi_empirical() method to detect semi-empirical methods

Changed¶

Refactored batch processing functionality from Orca.calculate_descriptors() into dedicated ORCABatchProcessing class
Orca.calculate_descriptors() now uses ORCABatchProcessing internally for backward compatibility
calculate_descriptors() now preserves original DataFrame columns (including ‘smiles’) instead of removing and re-adding them
Improved molecule hash calculation to include pre_optimize parameter for proper caching
Updated molecule hash calculation to exclude basis_set and dispersion_correction for semi-empirical methods
Enhanced input file generation to support semi-empirical methods (no basis set or dispersion correction needed)
Improved file cleanup to remove all ORCA files (including input and output files) since results are cached

Fixed¶

Fixed DataFrame handling in batch processing to preserve all original columns
Fixed error handling to provide concise error messages in INFO level and detailed information in DEBUG level

Technical Details¶

ORCABatchProcessing supports three parallelization modes: “sequential”, “multiprocessing”, and “mpirun”
Pre-optimization uses RDKit’s MMFF94 force field for fast geometry optimization before quantum chemical calculations
All ORCA files are automatically cleaned up after successful calculations, with results stored in cache
Semi-empirical methods are automatically detected and handled differently from DFT methods
Batch processing now includes time estimation based on benchmark machine performance

Version 0.2.2¶

Added¶

Added numpy>=1.20.0 to project dependencies (numpy was used but not declared)
Added dynamic time estimation updates in batch processing - time estimates are now refined based on actual execution times of previous molecules
Added _get_available_descriptors() method to dynamically discover available descriptor methods

Changed¶

Updated dipole moment parser to prioritize gas-phase values when available (for calculations without solvation)
Improved time estimation algorithm: * Changed scaling exponent from O(N^3.5) to O(N^2.5) for more realistic estimates * Uses total_time from benchmark instead of scf_time as base unit * More realistic optimization step estimation (15-35 steps instead of 10-50) * Removed artificial 24-hour time cap
Refactored calculate_descriptors() method: * Removed redundant code duplication (replaced large if-elif chain with getattr-based method calls) * Removed redundant all_descriptors list - descriptors are now discovered dynamically * Removed unnecessary comments * Improved code maintainability and readability

Fixed¶

Fixed dipole moment parser to correctly extract gas-phase values from ORCA output when available
Fixed time estimation showing unrealistic values (e.g., 47 hours for 2 molecules) - now provides accurate estimates based on actual benchmark data

Technical Details¶

Time estimator now uses exponential moving average for better prediction accuracy
Descriptor methods are called dynamically using getattr(self, desc_name)
Automatic descriptor discovery eliminates need to maintain manual descriptor lists