Run input configuration#

All the settings for a P4ward run can be found at the config.ini file. All settings have default values which are populated automatically if the configuration option cannot be found in config.ini. This page describes all settings. In order to view an .ini file with all the default values, run p4ward with:

p4ward --write_default

This will result in a file called default.ini to be written to the working directory.

Note

Default values are shown within square brackets.

Program Paths#

megadock

obabel

rxdock_root

general#

overwrite: true | [false].: If there are records from previous runs of the pipeline, but they should be ignored and a overwritten with a new run starting from scratch. Default: false.
receptor: [receptor.pdb]: Path to the receptor protein file, in pdb format.
ligase: [ligase.pdb]: Path to the ligase protein file, in pdb format.
protacs: [protacs.smiles]: Path to the protacs filelist, in smiles format.
receptor_ligand: [receptor_ligand.mol2]: Path to the receptor ligand file, in mol2 format.
ligase_ligand: [ligase_ligand.mol2]: Path to the ligase ligand file, in mol2 format.
rdkit_ligands_cleanup: [True] | False: RDKit’s sanitization of small molecules may fail for some structures. This keyword may be used to turn it off.
num_processors: [8]: How many processes to use for the parallel processing steps in the pipeline.

protein_prep#

pdbfixer: [True] | False: Whether to use pdbfixer to fix the structures before running the pipeline and ensure compatibility with all modules. Highly recommended.
pdbfixer_ignore_extremities: [True] | False: When adding missing residues, pdbfixer can be made to ignore protein extremities. This way, it will not attempt to add long sequences if the protein is truncated.
pdbfixer_ph: [7.0]: pH to add hydrogens for.
minimize: [True] | False: Use openmm to minimize the protein structure before running.
minimize_maxiter: [0]: Maximum minimization steps to perform. 0 indicates until convergence.
minimize_h_only: [True] | False: Minimize only the bonds to hydrogen and keep the rest of the protein restrained. Use this option if minimization is to be used only to optimize the coordinates of the hydrogens previously added.

megadock#

Run Megadock for protein-protein docking of receptor and ligase. The following options regulate its usage:

run_docking: [True] | False: Toggles protein-protein docking with Megadock.
num_predictions: [162000]: Out of all the protein poses generated by megadock, how many will go into the ranked output file written by megadock and parsed by P4ward. P4ward’s default: 162000, Megadock’s default: 2000.
num_rotational_angles: [54000]: How many angles to rotate the ligase protein by. P4ward’s default: 54000, Megadock’s default: 3600.
num_predictions_per_rotation: [3]: How many poses to generate per rotation. P4ward’s default: 3, Megadock’s default: 1.
run_docking_output_file: [megadock.out]: The main megadock output file path, where the docking results are placed.
run_docking_log_file: [megadock_run.log]: The log file which will contain what megadock would normally print to the terminal.

protein_filter#

ligand_distances: [True] | False: Toggle to filter the protein poses based on the proximity of the ligands after docking. For more information on how this is done, please refer to [XXXXX].

filter_dist_cutoff: [auto]: If ligand_distances is True, then this setting defines the distance cutoff. The option can either be a number or auto to generate a proximity cutoff based on the protac size.
filter_dist_sampling_type: [3D]: In order to calculate the distance between the ligands in the protac structure, P4ward employs RDKit to sample the unbound protac conformations. This setting defines if the conformation sampled should be three or two dimensional. 3D structures generate more restrictive distance values, while 2D structures will often yield the maximum possible distance between the ligands. Options: 3D or 2D.
crl_model_clash: [True] | False: Whether the protein poses should be filtered based on clashes against the CRL model.
clash_threshold: [1.0]: Distance between two atoms (not including hydrogens) that is considered a clash.
clash_count_tol: [10]: How many clashing atoms until the protein pose is considered to be clashing.
accessible_lysines: [True] | False: Whether to use the accessible lysine filter.
lysine_count: [1]: How many accessible lysines should a protein pose have in order to be considered productive.
lys_sasa_cutoff: [2.5]: The SASA value above which a lysine is considered to be at the protein surface and therefore is checked for accessibility.
overlap_dist_cutoff: [5.0]: Also described as “LOCut”. The distance value between a potentially occuding atom and the segment between lysine and ubiquitin, below which the atom is considered to be occluding and the lysine is then considered inaccessible.
vhl_ubq_dist_cutoff: [60.0]: Maximum distance between lysine and ubiquitin C-terminus for the VHL model.
crbn_ubq_dist_cutoff: [16.0]: Maximum distance between lysine and ubiquitin C-terminus for the CRBN model.
e3: [vhl] | crbn: Which E3 model to use. Values can be vhl or crbn.

protein_ranking#

cluster_poses_redundancy: True | [False]: Toggle clustering for redundancy. Please see [XXXXX] for a description of this clustering step.
cluster_poses_trend: [True] | False: Toggle clustering for trend capture of ternary complex modelling results. Please see [XXXXX] for a description of this clustering step.
clustering_cutoff_redund: [3.0]: Distance cutoff for redundancy clustering.
clustering_cutoff_trend: [10.0]: Distance cutoff for trend clustering.
cluster_redund_repr: [centroid] | best: Which cluster component should be the cluster representative. Values can be best, denoting best megadock score in the cluster, or centroid, denoting the cluster centroid. For the centroid option, if the cluster has only two components, the best scoring pose is selected as representative.
top_poses: [162000]: How many poses should be considered top poses for sampling. This value should be the same as (or higher than) num_predictions if the user intends to perform ternary complex rescoring (thus generating p4ward’s final score) and trend clustering. If the user intends to model ternary complexes for N top scoring protein poses, then this value should be adjusted accordingly.
generate_poses: [filtered]: Which set off protein poses should be written to disk in pdb format. If the user intends to perform RXdock rescoring, ternary complex rescoring (thus generating p4ward’s final score) and/or trend clustering, then this should be set to either filtered or all. Values can be none, all, filtered, top.
generate_poses_altlocA: [True] | False: Keep only alternate location A when generating pdb files.
generated_poses_folder: protein_docking: Folder name to save the protein pose pdb files to.
rescore_poses: [True] | False: Use P4ward’s final score.

protac_sampling#

unbound_protac_num_confs: [10]: How many conformations for the unbound protac should be generated with RDKit.

Linker sampling#

rdkit_sampling: [True] | False: Use rdkit to perform protac sampling.
protac_poses_folder: [protac_sampling]: Name of the folder where the generated protac sdf will reside.
extend_flexible_small_linker: [True] | False: If the linker consists of very few atoms, protac sampling will fail because small deviations on the extremities’ positions will make bonds unfeasible. With this option, if the pipeline detects that the linker is short (see min_linker_length), it will then consider more neighbouring atoms as flexible (see extend neighbour number).
extend_neighbour_number: [2]: If extend_flexible_small_linker is turned on, then this flag controls how many neighbouring atoms should become flexible.
min_linker_length: [2]: If the protac’s linker contains up to this many atoms, it is considered too short and can be extended if extend_flexible_small_linker is turned on.
rdkit_number_of_confs: [10]: How many protac linker conformations to generate.
rdkit_pose_rmsd_tolerance: [1.0] (angstroms): Some protac poses cannot be sampled while perfectly retaining the rigid ligands’ positions. This flag controls how much deviation is allowed when this happens.
rdkit_time_tolerance: [300] (seconds): Sometimes rdkit will get stuck for a very long time in a pose only to fail sampling. This flag sets a time limit to the time rdkit can spend in the sampling calculation for each pose. If the limit is reach, the pose is considered failed.
extend_top_poses_sampled: [True] | False: Extends how many protein poses are considered top (based on protein_ranking/top_poses) so that the top_poses number of poses have successfully generated protac conformations. For example, if the user determined top_poses to be 10, then the top 10 protein poses will be forwarded to protac sampling. However, a few of these may not be optimal for protac conformation and so would fail at sampling. So the pipeline will try sampling for the 11th pose, 12th and so on, until exactly 10 poses have successfully generated protac conformations.
extend_top_poses_score: [True] | False: When extending top poses sampled, also considered the protein-ligand interaction score by disregarding positive scores. This ensures that clashing models are discarded.
extend_top_poses_energy: True | [False]: When extending top poses sampled, also considered the internal energy calculated by RDKit. Poses where the protac internal energy is higher than the energies of all the unbound poses previously sampled are discarded.

Linker ranking#

protac_scoring_folder: protac_scoring: Name of the folder where the scored protac sdf files will reside.
rxdock_score: [True] | False: Use RXdock for scoring the protac conformations.
rxdock_minimize: True | [False]: Perform a a quick minimization with RXdock before scoring.
rxdock_target_score: [SCORE.INTER] | SCORE: Which score value to capture from RXdock. Values can be SCORE or SCORE.INTER. Please refer to RXdock documentation for detailed description of these values.

Outputs#

plots: [True] | False: Whether to generate analysis plots.
chimerax_view: [True] | False: Whether to generate ChimeraX script for visualization of the results.
write_crl_complex: [True] | False: Whether to write the full CRL complex models for the final predictions.
crl_cluster_rep_only: [True] | False: If writing CRL models, write only for the cluster representative structures.