Parallelisation#
Parallelising by wavenumber (ExoMol)#
Some line lists consist of millions or even billions of transitions. The ExoMol database separates these by wavenumber into multiple .trans-files. pyROX can be executed on these files in parallel and subsequently combine the results into a final output-file.
In this example, we’ll show how to run pyROX in parallel on ExoMol’s H2S line list. The configuration parameters below are also found in the examples-directory on GitHub.
# Basic information on database and species
database = 'exomol' # Can be ['exomol', 'hitran', 'hitemp', 'kurucz']
species = 'h2s' # Species name
mass = 33.987721 # Can be found in *.def.json file
# Input/output-directories
input_data_dir = './input_data/'
output_data_dir = './'
# Instructions to download from ExoMol database
urls = [
'https://www.exomol.com/db/H2S/1H2-32S/AYT2/1H2-32S__AYT2.def.json',
]
# Input-data files
files = dict(
#transitions = f'{input_data_dir}/1H2-32S__AYT2.trans.bz2',
transitions = [
f'{input_data_dir}/1H2-32S__AYT2__{nu_min:05d}-{nu_min+1000:05d}.trans.bz2'
for nu_min in range(0,35000,1000)
],
states = f'{input_data_dir}/1H2-32S__AYT2.states.bz2',
partition_function = f'{input_data_dir}/1H2-32S__AYT2.pf',
)
import numpy as np
# Pressure and temperature grids
P_grid = 10**np.array([-1.,0.]) # [bar]
T_grid = np.array([1000,2000]) # [K]
# Wavenumber grid
wave_min = 0.3; wave_max = 28.0 # [um]
delta_nu = 0.01 # [cm^-1]
adaptive_nu_grid = True
# Pressure-broadening information
perturber_info = dict(
H2 = dict(VMR=0.85, gamma=0.07, n=0.5), # gamma = [cm^-1]
He = dict(VMR=0.15, gamma=0.07, n=0.5),
)
# Line-strength cutoffs
global_cutoff = 1e-45 # [cm^-1 / (molecule cm^-2)]
local_cutoff = 0.25
# Function with arguments gamma_V [cm^-1], and P [bar]
wing_cutoff = lambda gamma_V, P: 25 if P<=200 else 100 # Gharib-Nezhad et al. (2024)
# Metadata to be stored in pRT3's .h5 file
pRT3_metadata = dict(
DOI = ['10.1093/mnras/stw1133','10.1016/j.jqsrt.2018.07.012'], # DOI of the data
mol_name = 'H2S', # Using the right capitalisation
linelist = 'AYT2', # Line-list name, used in .h5 filename
isotopologue_id = {'H2':1, 'S':32}, # Atomic number of each element
)
First, we’ll need to download the data (~0.9GB). Fortunately, the .def.json-file contains all the necessary information to download the .trans-files so we won’t need to provide each URL manually. Navigate to the examples/exomol_h2s-directory and run the following command:
cd examples/exomol_h2s
pyROX exomol_h2s.py -d
Note that the transitions key in the files dictionary above is a list of with the filenames of the 34 .trans-files. pyROX would loop over these files and run the calculations sequentially when called like pyROX exomol_h2s.py -c. However, we can provide the --files_range command-line argument to specify which files to run. The shell-script run_parallel.sh provides a convenient way to run 8 pyROX-calls simultaneously.
#!/bin/bash
N_FILES=34
N_TASKS=8
CONFIG_FILE=exomol_h2s.py
mkdir logs
#echo "Running $N_FILES files in parallel with $N_TASKS tasks at a time"
# Loop over all files
for ((i=0; i<N_FILES; i+=N_TASKS)); do
# Loop over the tasks (as long as there are files left)
for ((j=0; j<N_TASKS && i+j<N_FILES; j++)); do
idx_min=$((i+j))
idx_max=$((i+j+1))
#echo "Running file-range $idx_min, $idx_max"
pyROX $CONFIG_FILE -c -pbar --files_range $idx_min $idx_max > logs/range_${idx_min}_${idx_max}.log 2>&1 &
done
wait
done
#echo "All tasks completed."
The command-outputs will be logged in the logs-directory. The script should take ~40 minutes to complete for all files (depending on the number of parallel tasks) and can be executed via:
sh ./run_parallel.sh &
Note
By default, the temporary output-files will be named xsec_<transition_filename>.hdf5 which avoids overwriting the files.
The temporary output-files can then be combined into a final output-file and plotted with:
pyROX exomol_h2s.py -s -p
Note
If the temporary or final output-files already exist, pyROX prompts you to overwrite them which can result in an error when running from a shell-script. The command-line argument --overwrite (or -o) can be used to overwrite the files without prompting.
Parallelising by PT-grid#
In a similar way, pyROX can run multiple pressure-temperature points in parallel, which can be useful for the HITRAN/HITEMP line lists. The examples/hitemp_co-directory provides a configuration file and a shell-script to run pyROX on 4 temperature points in parallel.
#!/bin/bash
CONFIG_FILE=hitemp_co.py
mkdir logs
# Run 4 pyROX-calls in parallel with different temperatures
pyROX $CONFIG_FILE -c -pbar --T_grid 500 -out xsec_T500K.hdf5 > logs/range_T500K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 1000 -out xsec_T1000K.hdf5 > logs/range_T1000K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 2000 -out xsec_T2000K.hdf5 > logs/range_T2000K.log 2>&1 &
pyROX $CONFIG_FILE -c -pbar --T_grid 3000 -out xsec_T3000K.hdf5 > logs/range_T3000K.log 2>&1 &
wait
Note
In the above shell-script we provide new basenames for the output-files (--tmp_output_basename or -out) to avoid overwriting the temporary output-files.
Note
HITRAN/HITEMP requires a login to download data, which will raise an error when downloading the 05_HITEMP2019.par.bz2-file. Please download this manually and move it into the input_data_dir directory.
The partition function and broadening files can be downloaded with the --download (or -d) command-line argument. After downloading the data, the shell-script can be executed and will take ~5 minutes to complete for all temperature points. Finally, the temporary output-files can be combined into a final output-file and plotted.
cd examples/hitemp_co
pyROX hitemp_co.py -d
sh ./run_parallel.sh
pyROX hitemp_co.py -s -p
Similar to the --T_grid command-line argument, the --P_grid command-line argument can be used to specify the pressure points. In that way, the user can also run pyROX, in parallel, on multiple pressure points.