Changelog
=========


Version 0.3.4
-------------

Added
~~~~~

* Added ``clear`` CLI command to remove all ORCA files in the working directory (useful for cleaning up files that weren't removed due to errors)
* Added ``purge_cache`` CLI command to remove ORCA cache

Changed
~~~~~~~

* Improved cache system to preserve original file extensions (`.out`, `.log`, `.smd.out`) when storing files
* Enhanced NBO stabilization energy parsing with better error messages indicating NBOEXE environment variable requirement
* Updated test for AM1 Mayer bond indices to account for different values compared to DFT methods

Removed
~~~~~~~

* Removed ESP extrema descriptor (``get_esp_extrema``) - not implemented due to limitations in ORCA 6.0.1 for generating ESP cube files directly (would require orca_plot utility integration)

Fixed
~~~~~

* Fixed cache system issue where files were saved with `.out` extension regardless of original extension, causing file not found errors
* Fixed NBO stabilization energy parsing to properly detect when NBOEXE environment variable is not set
* Fixed NMR chemical shifts parsing to work correctly with cached files
* Fixed test for AM1 Mayer bond indices to accept different value ranges compared to DFT

Technical Details
~~~~~~~~~~~~~~~~~

* Cache now preserves original file extensions to ensure correct file retrieval
* NBO analysis requires NBOEXE environment variable to be set to point to NBO executable (nbo6.exe or nbo5.exe)
* ESP extrema calculation would require integration with orca_plot utility, which is not currently implemented
* CLI commands ``clear`` and ``purge_cache`` help maintain clean working directories and cache management


Version 0.3.3
-------------

Changed
~~~~~~~

* Major code refactoring: split large ``orca.py`` file (1620 lines) into modular structure:
  * Created ``base.py`` with ``OrcaBase`` class containing common utility methods
  * Created ``calculation.py`` with ``CalculationMixin`` for calculation execution methods
  * Created ``decorators.py`` with ``handle_x_molecule`` decorator
  * Split descriptors into separate modules by category:
    * ``descriptors/electronic.py`` - Electronic property descriptors
    * ``descriptors/energy.py`` - Energy-related descriptors
    * ``descriptors/structural.py`` - Structural property descriptors
    * ``descriptors/topological.py`` - Topological descriptors
    * ``descriptors/misc.py`` - Miscellaneous descriptors
  * Main ``Orca`` class now uses multiple inheritance from mixins
  * Improved code organization and maintainability
  * Removed redundant comments throughout the codebase

Technical Details
~~~~~~~~~~~~~~~~~

* The new modular structure makes it easier to extend functionality and maintain code
* Descriptors are organized by category for better code navigation
* All functionality remains backward compatible


Version 0.3.0
-------------

Added
~~~~~

* Added new ``ORCABatchProcessing`` class for efficient batch processing of molecular descriptors with pandas compatibility
* Added support for semi-empirical methods (AM1, PM3, PM6, PM7, RM1, MNDO, MNDOD, OM1, OM2, OM3)
* Added ``pre_optimize`` parameter (default: ``True``) for pre-optimizing molecular geometry using MMFF94 force field before ORCA calculations
* Added multiprocessing support for parallel batch processing via ``parallel_mode="multiprocessing"`` parameter
* Added automatic cleanup of all ORCA temporary files (input, output, and all temporary files) after calculations since results are cached
* Added improved error parsing with brief summaries in ``logging.INFO`` and detailed information in ``logging.DEBUG``
* Added ``_pre_optimize_geometry()`` method for MMFF94 geometry optimization
* Added ``_is_semi_empirical()`` method to detect semi-empirical methods

Changed
~~~~~~~

* Refactored batch processing functionality from ``Orca.calculate_descriptors()`` into dedicated ``ORCABatchProcessing`` class
* ``Orca.calculate_descriptors()`` now uses ``ORCABatchProcessing`` internally for backward compatibility
* ``calculate_descriptors()`` now preserves original DataFrame columns (including 'smiles') instead of removing and re-adding them
* Improved molecule hash calculation to include ``pre_optimize`` parameter for proper caching
* Updated molecule hash calculation to exclude ``basis_set`` and ``dispersion_correction`` for semi-empirical methods
* Enhanced input file generation to support semi-empirical methods (no basis set or dispersion correction needed)
* Improved file cleanup to remove all ORCA files (including input and output files) since results are cached

Fixed
~~~~~

* Fixed DataFrame handling in batch processing to preserve all original columns
* Fixed error handling to provide concise error messages in INFO level and detailed information in DEBUG level

Technical Details
~~~~~~~~~~~~~~~~~

* ``ORCABatchProcessing`` supports three parallelization modes: "sequential", "multiprocessing", and "mpirun"
* Pre-optimization uses RDKit's MMFF94 force field for fast geometry optimization before quantum chemical calculations
* All ORCA files are automatically cleaned up after successful calculations, with results stored in cache
* Semi-empirical methods are automatically detected and handled differently from DFT methods
* Batch processing now includes time estimation based on benchmark machine performance


Version 0.2.2
--------------

Added
~~~~~

* Added ``numpy>=1.20.0`` to project dependencies (numpy was used but not declared)
* Added dynamic time estimation updates in batch processing - time estimates are now refined based on actual execution times of previous molecules
* Added ``_get_available_descriptors()`` method to dynamically discover available descriptor methods

Changed
~~~~~~~

* Updated dipole moment parser to prioritize gas-phase values when available (for calculations without solvation)
* Improved time estimation algorithm:
  * Changed scaling exponent from O(N^3.5) to O(N^2.5) for more realistic estimates
  * Uses ``total_time`` from benchmark instead of ``scf_time`` as base unit
  * More realistic optimization step estimation (15-35 steps instead of 10-50)
  * Removed artificial 24-hour time cap
* Refactored ``calculate_descriptors()`` method:
  * Removed redundant code duplication (replaced large if-elif chain with ``getattr``-based method calls)
  * Removed redundant ``all_descriptors`` list - descriptors are now discovered dynamically
  * Removed unnecessary comments
  * Improved code maintainability and readability

Fixed
~~~~~

* Fixed dipole moment parser to correctly extract gas-phase values from ORCA output when available
* Fixed time estimation showing unrealistic values (e.g., 47 hours for 2 molecules) - now provides accurate estimates based on actual benchmark data

Technical Details
~~~~~~~~~~~~~~~~~

* Time estimator now uses exponential moving average for better prediction accuracy
* Descriptor methods are called dynamically using ``getattr(self, desc_name)``
* Automatic descriptor discovery eliminates need to maintain manual descriptor lists