Tutorial ======== This tutorial will guide you through using the ORCA Descriptors library both as a Python library and as a command-line tool. Using as a Python Library -------------------------- Basic Usage ~~~~~~~~~~~ First, import the necessary classes:: from orca_descriptors import Orca from rdkit.Chem import MolFromSmiles, AddHs Initialize the ORCA calculator with your preferred settings:: orca = Orca( script_path="orca", functional="PBE0", basis_set="def2-SVP", method_type="Opt", dispersion_correction="D3BJ", solvation_model="COSMO(Water)", n_processors=8, pre_optimize=True, # Enable geometry pre-optimization with MMFF94 ) Create a molecule from a SMILES string:: mol = AddHs(MolFromSmiles("C1=CC=CC=C1")) # Benzene Calculate descriptors:: # Energy descriptors homo = orca.homo_energy(mol) lumo = orca.lumo_energy(mol) gap = orca.gap_energy(mol) # DFT descriptors mu = orca.ch_potential(mol) chi = orca.electronegativity(mol) eta = orca.abs_hardness(mol) # Thermodynamic descriptors energy = orca.total_energy(mol) gibbs = orca.gibbs_free_energy(mol) # Molecular orbital descriptors homo_minus_1 = orca.mo_energy(mol, index=-2) # HOMO-1 energy # Charge descriptors min_h_charge = orca.get_min_h_charge(mol, method="ESP") # Minimum H charge # Geometric descriptors xy_area = orca.xy_shadow(mol) # XY projection area # Reactivity descriptors meric = orca.meric(mol) # Electrophilicity index for carbon # Topological descriptors t_oo = orca.topological_distance(mol, 'O', 'O') # Sum of O-O distances nrot = orca.num_rotatable_bonds(mol) # Number of rotatable bonds wiener = orca.wiener_index(mol) # Wiener index # Physicochemical descriptors logp = orca.m_log_p(mol) # Octanol/water partition coefficient sasa = orca.solvent_accessible_surface_area(mol) # SASA # Autocorrelation descriptors mats2v = orca.moran_autocorrelation(mol, lag=2, weight='vdw_volume') hats4u = orca.autocorrelation_hats(mol, lag=4, unweighted=True) Caching ~~~~~~~ The library automatically caches calculation results. If you calculate descriptors for the same molecule with the same parameters, it will use the cached result instead of running ORCA again:: # First calculation - runs ORCA homo1 = orca.homo_energy(mol) # Takes time # Second calculation - uses cache homo2 = orca.homo_energy(mol) # Instant Choosing Functionals and Methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The library supports both DFT methods and semi-empirical methods: DFT Methods ^^^^^^^^^^^ For high-accuracy calculations, use DFT functionals with basis sets:: # High-accuracy DFT calculation orca_dft = Orca( functional="PBE0", # Hybrid functional basis_set="def2-TZVP", # Triple-zeta basis set method_type="Opt", # Geometry optimization dispersion_correction="D3BJ", # Dispersion correction n_processors=8, ) Common DFT functionals: - ``PBE0`` - Hybrid GGA functional, good balance of accuracy and speed - ``B3LYP`` - Popular hybrid functional - ``M06-2X`` - Meta-GGA functional, good for thermochemistry - ``ωB97X-D`` - Range-separated hybrid with dispersion Common basis sets: - ``def2-SVP`` - Small, fast (default) - ``def2-TZVP`` - Triple-zeta, more accurate - ``def2-QZVP`` - Quadruple-zeta, very accurate but slow Semi-Empirical Methods ^^^^^^^^^^^^^^^^^^^^^^ For faster calculations on large molecules, use semi-empirical methods:: # Fast semi-empirical calculation orca_semi = Orca( functional="AM1", # Semi-empirical method method_type="SP", # Single point (no optimization) n_processors=1, pre_optimize=True, # Pre-optimize with MMFF94 ) Supported semi-empirical methods: - ``AM1`` - Austin Model 1, good for organic molecules - ``PM3`` - Parametric Method 3, improved over AM1 - ``PM6`` - Parametric Method 6, better for transition metals - ``PM7`` - Parametric Method 7, improved accuracy - ``RM1`` - Recife Model 1, optimized for organic compounds Note: For semi-empirical methods, ``basis_set`` and ``dispersion_correction`` parameters are automatically ignored. Geometry Pre-Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, the library performs geometry pre-optimization using RDKit's MMFF94 force field before sending the molecule to ORCA. This can significantly speed up ORCA calculations, especially for geometry optimizations:: # With pre-optimization (default) orca = Orca( functional="PBE0", method_type="Opt", pre_optimize=True, # Default: True ) # Without pre-optimization orca = Orca( functional="PBE0", method_type="Opt", pre_optimize=False, ) Benefits of pre-optimization: - Faster ORCA convergence (fewer optimization steps needed) - More stable calculations (better starting geometry) - Reduced computational cost The pre-optimization uses MMFF94, which requires explicit hydrogens. The library automatically adds hydrogens if needed and generates 3D coordinates if not present. Method Types ~~~~~~~~~~~~ Choose the appropriate calculation type based on your needs:: # Single point energy calculation (fastest) orca_sp = Orca( functional="PBE0", method_type="SP", # Single point ) # Geometry optimization (slower, but provides optimized geometry) orca_opt = Orca( functional="PBE0", method_type="Opt", # Optimization ) Note: Some descriptors (like ``molecular_volume``, ``polar_surface_area``, ``solvent_accessible_surface_area``) require optimized geometries and may not work correctly with ``method_type="SP"``. Using as a Command-Line Tool ----------------------------- The library can also be used as a command-line utility after installation. Run Benchmark ~~~~~~~~~~~~~ Before estimating calculation times, run a benchmark:: orca_descriptors run_benchmark --working_dir ./calculations Estimate Calculation Time ~~~~~~~~~~~~~~~~~~~~~~~~~ Estimate how long a calculation will take:: orca_descriptors approximate_time --molecule CCO --method_type Opt All ORCA parameters are available as CLI arguments. For example:: orca_descriptors approximate_time \\ --molecule CCO \\ --functional PBE0 \\ --basis_set def2-TZVP \\ --n_processors 4 \\ --method_type Opt Example Workflow ---------------- Here's a complete example of calculating descriptors for multiple molecules:: from orca_descriptors import Orca from rdkit.Chem import MolFromSmiles, AddHs # Initialize calculator orca = Orca( working_dir="./calculations", functional="PBE0", basis_set="def2-SVP", method_type="Opt", n_processors=4, ) # List of molecules to process smiles_list = [ "C1=CC=CC=C1", # Benzene "CCO", # Ethanol "CC(=O)C", # Acetone ] results = [] for smiles in smiles_list: mol = AddHs(MolFromSmiles(smiles)) results.append({ "smiles": smiles, "homo": orca.homo_energy(mol), "lumo": orca.lumo_energy(mol), "gap": orca.gap_energy(mol), "dipole": orca.dipole_moment(mol), }) # Process results for r in results: print(f"{r['smiles']}: Gap = {r['gap']:.2f} eV") Available Descriptors --------------------- The library provides a comprehensive set of descriptors for QSAR analysis: Energy Descriptors ~~~~~~~~~~~~~~~~~~ - ``homo_energy(mol)`` - HOMO energy (eV) - ``lumo_energy(mol)`` - LUMO energy (eV) - ``gap_energy(mol)`` - HOMO-LUMO gap (eV) - ``mo_energy(mol, index)`` - Molecular orbital energy by index (eV) - ``total_energy(mol)`` - Total energy (Hartree) DFT Descriptors ~~~~~~~~~~~~~~~ - ``ch_potential(mol)`` - Chemical potential (eV) - ``electronegativity(mol)`` - Electronegativity (eV) - ``abs_hardness(mol)`` - Absolute hardness (eV) - ``abs_softness(mol)`` - Absolute softness (1/eV) - ``frontier_electron_density(mol)`` - Frontier electron density Charge Descriptors ~~~~~~~~~~~~~~~~~~ - ``get_atom_charges(mol)`` - Mulliken atomic charges - ``get_min_h_charge(mol, method="ESP")`` - Minimum hydrogen charge Geometric Descriptors ~~~~~~~~~~~~~~~~~~~~~ - ``xy_shadow(mol)`` - XY projection area (Ų) - ``molecular_volume(mol)`` - Molecular volume (ų) - ``get_bond_lengths(mol, atom1, atom2)`` - Bond lengths (Å) Reactivity Descriptors ~~~~~~~~~~~~~~~~~~~~~~ - ``meric(mol)`` - Minimum electrophilicity index for carbon (eV) - ``dipole_moment(mol)`` - Dipole moment (Debye) Topological Descriptors ~~~~~~~~~~~~~~~~~~~~~~~ - ``topological_distance(mol, atom1, atom2)`` - Sum of topological distances - ``num_rotatable_bonds(mol)`` - Number of rotatable bonds - ``wiener_index(mol)`` - Wiener index Physicochemical Descriptors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``m_log_p(mol)`` - Moriguchi Log P (octanol/water partition coefficient) - ``polar_surface_area(mol)`` - Polar surface area (Ų) - ``solvent_accessible_surface_area(mol)`` - SASA (Ų) Thermodynamic Descriptors ~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``gibbs_free_energy(mol)`` - Gibbs free energy (Hartree) - ``entropy(mol)`` - Entropy (J/(mol·K)) - ``enthalpy(mol)`` - Enthalpy (Hartree) Autocorrelation Descriptors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``moran_autocorrelation(mol, lag, weight)`` - Moran autocorrelation - ``autocorrelation_hats(mol, lag, unweighted)`` - HATS autocorrelation