Changelog ========= Version 0.3.4 ------------- Added ~~~~~ * Added ``clear`` CLI command to remove all ORCA files in the working directory (useful for cleaning up files that weren't removed due to errors) * Added ``purge_cache`` CLI command to remove ORCA cache Changed ~~~~~~~ * Improved cache system to preserve original file extensions (`.out`, `.log`, `.smd.out`) when storing files * Enhanced NBO stabilization energy parsing with better error messages indicating NBOEXE environment variable requirement * Updated test for AM1 Mayer bond indices to account for different values compared to DFT methods Removed ~~~~~~~ * Removed ESP extrema descriptor (``get_esp_extrema``) - not implemented due to limitations in ORCA 6.0.1 for generating ESP cube files directly (would require orca_plot utility integration) Fixed ~~~~~ * Fixed cache system issue where files were saved with `.out` extension regardless of original extension, causing file not found errors * Fixed NBO stabilization energy parsing to properly detect when NBOEXE environment variable is not set * Fixed NMR chemical shifts parsing to work correctly with cached files * Fixed test for AM1 Mayer bond indices to accept different value ranges compared to DFT Technical Details ~~~~~~~~~~~~~~~~~ * Cache now preserves original file extensions to ensure correct file retrieval * NBO analysis requires NBOEXE environment variable to be set to point to NBO executable (nbo6.exe or nbo5.exe) * ESP extrema calculation would require integration with orca_plot utility, which is not currently implemented * CLI commands ``clear`` and ``purge_cache`` help maintain clean working directories and cache management Version 0.3.3 ------------- Changed ~~~~~~~ * Major code refactoring: split large ``orca.py`` file (1620 lines) into modular structure: * Created ``base.py`` with ``OrcaBase`` class containing common utility methods * Created ``calculation.py`` with ``CalculationMixin`` for calculation execution methods * Created ``decorators.py`` with ``handle_x_molecule`` decorator * Split descriptors into separate modules by category: * ``descriptors/electronic.py`` - Electronic property descriptors * ``descriptors/energy.py`` - Energy-related descriptors * ``descriptors/structural.py`` - Structural property descriptors * ``descriptors/topological.py`` - Topological descriptors * ``descriptors/misc.py`` - Miscellaneous descriptors * Main ``Orca`` class now uses multiple inheritance from mixins * Improved code organization and maintainability * Removed redundant comments throughout the codebase Technical Details ~~~~~~~~~~~~~~~~~ * The new modular structure makes it easier to extend functionality and maintain code * Descriptors are organized by category for better code navigation * All functionality remains backward compatible Version 0.3.0 ------------- Added ~~~~~ * Added new ``ORCABatchProcessing`` class for efficient batch processing of molecular descriptors with pandas compatibility * Added support for semi-empirical methods (AM1, PM3, PM6, PM7, RM1, MNDO, MNDOD, OM1, OM2, OM3) * Added ``pre_optimize`` parameter (default: ``True``) for pre-optimizing molecular geometry using MMFF94 force field before ORCA calculations * Added multiprocessing support for parallel batch processing via ``parallel_mode="multiprocessing"`` parameter * Added automatic cleanup of all ORCA temporary files (input, output, and all temporary files) after calculations since results are cached * Added improved error parsing with brief summaries in ``logging.INFO`` and detailed information in ``logging.DEBUG`` * Added ``_pre_optimize_geometry()`` method for MMFF94 geometry optimization * Added ``_is_semi_empirical()`` method to detect semi-empirical methods Changed ~~~~~~~ * Refactored batch processing functionality from ``Orca.calculate_descriptors()`` into dedicated ``ORCABatchProcessing`` class * ``Orca.calculate_descriptors()`` now uses ``ORCABatchProcessing`` internally for backward compatibility * ``calculate_descriptors()`` now preserves original DataFrame columns (including 'smiles') instead of removing and re-adding them * Improved molecule hash calculation to include ``pre_optimize`` parameter for proper caching * Updated molecule hash calculation to exclude ``basis_set`` and ``dispersion_correction`` for semi-empirical methods * Enhanced input file generation to support semi-empirical methods (no basis set or dispersion correction needed) * Improved file cleanup to remove all ORCA files (including input and output files) since results are cached Fixed ~~~~~ * Fixed DataFrame handling in batch processing to preserve all original columns * Fixed error handling to provide concise error messages in INFO level and detailed information in DEBUG level Technical Details ~~~~~~~~~~~~~~~~~ * ``ORCABatchProcessing`` supports three parallelization modes: "sequential", "multiprocessing", and "mpirun" * Pre-optimization uses RDKit's MMFF94 force field for fast geometry optimization before quantum chemical calculations * All ORCA files are automatically cleaned up after successful calculations, with results stored in cache * Semi-empirical methods are automatically detected and handled differently from DFT methods * Batch processing now includes time estimation based on benchmark machine performance Version 0.2.2 -------------- Added ~~~~~ * Added ``numpy>=1.20.0`` to project dependencies (numpy was used but not declared) * Added dynamic time estimation updates in batch processing - time estimates are now refined based on actual execution times of previous molecules * Added ``_get_available_descriptors()`` method to dynamically discover available descriptor methods Changed ~~~~~~~ * Updated dipole moment parser to prioritize gas-phase values when available (for calculations without solvation) * Improved time estimation algorithm: * Changed scaling exponent from O(N^3.5) to O(N^2.5) for more realistic estimates * Uses ``total_time`` from benchmark instead of ``scf_time`` as base unit * More realistic optimization step estimation (15-35 steps instead of 10-50) * Removed artificial 24-hour time cap * Refactored ``calculate_descriptors()`` method: * Removed redundant code duplication (replaced large if-elif chain with ``getattr``-based method calls) * Removed redundant ``all_descriptors`` list - descriptors are now discovered dynamically * Removed unnecessary comments * Improved code maintainability and readability Fixed ~~~~~ * Fixed dipole moment parser to correctly extract gas-phase values from ORCA output when available * Fixed time estimation showing unrealistic values (e.g., 47 hours for 2 molecules) - now provides accurate estimates based on actual benchmark data Technical Details ~~~~~~~~~~~~~~~~~ * Time estimator now uses exponential moving average for better prediction accuracy * Descriptor methods are called dynamically using ``getattr(self, desc_name)`` * Automatic descriptor discovery eliminates need to maintain manual descriptor lists