Skip to content

Ligand Analysis Tutorial

Sicifus provides tools to analyze ligand binding sites across massive datasets. You can identify binding residues, detect pi-stacking interactions, and visualize atom-level contacts.

1. Binding Residue Distribution

Identify which residues are commonly found near a specific ligand (using CA atoms for speed).

from sicifus import Sicifus

db = Sicifus(db_path="./my_db")

# Analyze binding residues for Glucose (GLC)
db.analyze_ligand_binding("GLC")

2. Pi-Stacking Interactions

Detect aromatic stacking interactions (sandwich, parallel displaced, T-shaped) between protein aromatic residues (PHE, TYR, TRP, HIS) and ligand aromatic rings.

Note: Requires all_atom data. If you ingested before v0.2.0, re-run ingest.

# Detect pi-stacking with ATP
pi_df = db.analyze_pi_stacking("ATP")
# Found 47 interactions: 8 sandwich, 31 parallel displaced, 8 T-shaped
print(pi_df)

# Save the plot
db.analyze_pi_stacking("ATP", output_file="pi_stacking.png")

3. Atom-Level Contacts

Identify specific atomic contacts (e.g., hydrogen bonds) between ligand and protein atoms.

Note: Requires all_atom data.

# Analyze contacts for Glucose (GLC) with a 3.3 Å cutoff (H-bonds)
contacts = db.analyze_ligand_contacts("GLC", distance_cutoff=3.3)
# Found 156 contacts (42 potential H-bonds: N/O/S pairs)
print(contacts)

# Widen the cutoff for more general contacts
contacts_wide = db.analyze_ligand_contacts("GLC", distance_cutoff=4.5)

# Save both the contacts chart and the 2D ligand depiction
db.analyze_ligand_contacts("GLC", output_file="contacts.png",
                            ligand_2d_file="glc_2d.png")

# If you only pass output_file, the 2D depiction auto-saves as {stem}_ligand2d.png
db.analyze_ligand_contacts("GLC", output_file="contacts.png")
# -> also saves contacts_ligand2d.png

4. Binding Pocket Composition

Filter structures based on the amino acid composition of their binding pockets.

import polars as pl

# Get a DataFrame of residue counts in the binding pocket (e.g. within 8 Å)
pockets = db.get_binding_pockets("GLC", distance_cutoff=8.0)
print(pockets)
# shape: (150, 21)
# ┌──────────────┬─────┬─────┬─────┬─────┐
# │ structure_id ┆ ALA ┆ ARG ┆ ... ┆ TRP │
# ╞══════════════╪═════╪═════╪═════╪═════╡
# │ 1abc         ┆ 2   ┆ 0   ┆ ... ┆ 1   │
# │ 2xyz         ┆ 1   ┆ 1   ┆ ... ┆ 0   │
# └──────────────┴─────┴─────┴─────┴─────┘

# Filter: Find all structures that have at least one Tryptophan in the pocket
trp_binders = pockets.filter(pl.col("TRP") > 0)
print(f"Found {trp_binders.height} structures with TRP in the GLC pocket.")

# Visualize the aggregate composition across all structures
db.analyze_binding_pocket("GLC", distance_cutoff=8.0)

2D Visualization (RDKit)

The analyze_ligand_contacts function generates a 2D depiction of the ligand with atoms color-coded by contact count (red = many contacts, blue = few).

This requires RDKit. Install with: pip install sicifus[viz] or pip install rdkit.

If RDKit is not available, the contacts bar chart still works, but the 2D image is skipped.