Parallelisation#

Parallelising by wavenumber (ExoMol)#

Some line lists consist of millions or even billions of transitions. The ExoMol database separates these by wavenumber into multiple .trans-files. pyROX can be executed on these files in parallel and subsequently combine the results into a final output-file.

In this example, we’ll show how to run pyROX in parallel on ExoMol’s H₂S line list. The configuration parameters below are also found in the examples-directory on GitHub.

# Basic information on database and species
database = 'exomol' # Can be ['exomol', 'hitran', 'hitemp', 'kurucz']
species  = 'h2s'    # Species name
mass = 33.987721    # Can be found in *.def.json file

# Input/output-directories
input_data_dir  = './input_data/'
output_data_dir = './'

# Instructions to download from ExoMol database
urls = [
    'https://www.exomol.com/db/H2S/1H2-32S/AYT2/1H2-32S__AYT2.def.json', 
]

# Input-data files
files = dict(
    #transitions = f'{input_data_dir}/1H2-32S__AYT2.trans.bz2',
    transitions = [
        f'{input_data_dir}/1H2-32S__AYT2__{nu_min:05d}-{nu_min+1000:05d}.trans.bz2'
        for nu_min in range(0,35000,1000)
    ], 
    states      = f'{input_data_dir}/1H2-32S__AYT2.states.bz2',
    partition_function = f'{input_data_dir}/1H2-32S__AYT2.pf',
)

import numpy as np
# Pressure and temperature grids
P_grid = 10**np.array([-1.,0.]) # [bar]
T_grid = np.array([1000,2000])   # [K]

# Wavenumber grid
wave_min = 0.3; wave_max = 28.0 # [um]
delta_nu = 0.01 # [cm^-1]
adaptive_nu_grid = True

# Pressure-broadening information
perturber_info = dict(
    H2 = dict(VMR=0.85, gamma=0.07, n=0.5), # gamma = [cm^-1]
    He = dict(VMR=0.15, gamma=0.07, n=0.5),
)

# Line-strength cutoffs
global_cutoff = 1e-45 # [cm^-1 / (molecule cm^-2)]
local_cutoff  = 0.25

# Function with arguments gamma_V [cm^-1], and P [bar]
wing_cutoff = lambda gamma_V, P: 25 if P<=200 else 100 # Gharib-Nezhad et al. (2024)

# Metadata to be stored in pRT3's .h5 file
pRT3_metadata = dict(
    DOI = ['10.1093/mnras/stw1133','10.1016/j.jqsrt.2018.07.012'], # DOI of the data
    mol_name = 'H2S',                    # Using the right capitalisation
    linelist = 'AYT2',                   # Line-list name, used in .h5 filename
    isotopologue_id = {'H2':1, 'S':32},  # Atomic number of each element
)

First, we’ll need to download the data (~0.9GB). Fortunately, the .def.json-file contains all the necessary information to download the .trans-files so we won’t need to provide each URL manually. Navigate to the examples/exomol_h2s-directory and run the following command:

cd examples/exomol_h2s
pyROX exomol_h2s.py -d

Note that the transitions key in the files dictionary above is a list of with the filenames of the 34 .trans-files. pyROX would loop over these files and run the calculations sequentially when called like pyROX exomol_h2s.py -c. However, we can provide the --files_range command-line argument to specify which files to run. The shell-script run_parallel.sh provides a convenient way to run 8 pyROX-calls simultaneously.

#!/bin/bash

N_FILES=34
N_TASKS=8
CONFIG_FILE=exomol_h2s.py

mkdir logs
#echo "Running $N_FILES files in parallel with $N_TASKS tasks at a time"

# Loop over all files
for ((i=0; i<N_FILES; i+=N_TASKS)); do
    # Loop over the tasks (as long as there are files left)
    for ((j=0; j<N_TASKS && i+j<N_FILES; j++)); do
        idx_min=$((i+j))
        idx_max=$((i+j+1))
        #echo "Running file-range $idx_min, $idx_max"
        pyROX $CONFIG_FILE -c -pbar --files_range $idx_min $idx_max > logs/range_${idx_min}_${idx_max}.log 2>&1 &
    done
    wait
done
#echo "All tasks completed."

The command-outputs will be logged in the logs-directory. The script should take ~40 minutes to complete for all files (depending on the number of parallel tasks) and can be executed via:

sh ./run_parallel.sh &

Note

By default, the temporary output-files will be named xsec_<transition_filename>.hdf5 which avoids overwriting the files.

The temporary output-files can then be combined into a final output-file and plotted with:

pyROX exomol_h2s.py -s -p

Note

If the temporary or final output-files already exist, pyROX prompts you to overwrite them which can result in an error when running from a shell-script. The command-line argument --overwrite (or -o) can be used to overwrite the files without prompting.

Parallelising by PT-grid#

In a similar way, pyROX can run multiple pressure-temperature points in parallel, which can be useful for the HITRAN/HITEMP line lists. The examples/hitemp_co-directory provides a configuration file and a shell-script to run pyROX on 4 temperature points in parallel.

#!/bin/bash

CONFIG_FILE=hitemp_co.py

mkdir logs
# Run 4 pyROX-calls in parallel with different temperatures
pyROX $CONFIG_FILE -c -pbar --T_grid 500 -out xsec_T500K.hdf5 > logs/range_T500K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 1000 -out xsec_T1000K.hdf5 > logs/range_T1000K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 2000 -out xsec_T2000K.hdf5 > logs/range_T2000K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 3000 -out xsec_T3000K.hdf5 > logs/range_T3000K.log 2>&1 &
wait

Note

In the above shell-script we provide new basenames for the output-files (--tmp_output_basename or -out) to avoid overwriting the temporary output-files.

Note

HITRAN/HITEMP requires a login to download data, which will raise an error when downloading the 05_HITEMP2019.par.bz2-file. Please download this manually and move it into the input_data_dir directory.

The partition function and broadening files can be downloaded with the --download (or -d) command-line argument. After downloading the data, the shell-script can be executed and will take ~5 minutes to complete for all temperature points. Finally, the temporary output-files can be combined into a final output-file and plotted.

cd examples/hitemp_co
pyROX hitemp_co.py -d
sh ./run_parallel.sh
pyROX hitemp_co.py -s -p

Similar to the --T_grid command-line argument, the --P_grid command-line argument can be used to specify the pressure points. In that way, the user can also run pyROX, in parallel, on multiple pressure points.

Parallelisation

Contents

Parallelisation#

Parallelising by wavenumber (ExoMol)#

Parallelising by PT-grid#