User guide Database navigation Structure, function and target discovery

HANSEN: Contact Information

This guide explains how to search HANSEN, interpret protein pages, use the structural viewers, explore GO annotations, inspect model statistics, and move from database-level summaries to detailed protein-level evidence.

Fast Routes
Best starting point: use the home-page query panel if you know a UniProt accession, ML locus tag, gene, protein name, ligand code or sequence fragment.

1. What HANSEN Contains

HANSEN is an integrated structural and functional resource for the Mycobacterium leprae proteome. It brings together core protein identifiers, curated functional annotations, sequence features, cross-references, homology and de novo structure models, oligomer assemblies, ligand annotations, pocket predictions, AF2Bind binding-site predictions, PAE confidence maps and B-cell epitope propensity information.

UniProt and ML locus identifiers Gene and protein annotation AF3 / Boltz / Chai / Boltz2 models Monomers and oligomers Mol* structure viewer pLDDT and PAE Pockets and AF2Bind Ligands and PubChem links GO browser Database statistics

2. Home Page and Search

The home page is the main entry point. Use the query panel to search by identifiers, names, ligands or sequence. The top navigation provides direct access to the query area, database background, statistics dashboard and GO browser.

Use the Query Panel When You Know a Target

Search using a UniProt accession, ML locus tag, gene name, protein name, ligand symbol/name/PubChem CID, or a protein sequence fragment.

Use Global Pages When You are Exploring

Open the Statistics dashboard for database-wide coverage or the GO Browser to discover groups of proteins by ontology.

About the Resource

About HANSEN and Database Scope

This section has been moved from the home page so that the home page remains focused on search, while the Help page carries the full explanatory guide.

About HANSEN

HANSEN is designed to support translational and mechanistic leprosy research by making M. leprae protein-centric information easy to search, inspect, and reuse. The platform links identifiers, functional evidence, sequence features, and structural models in a form suitable for target triage, biomarker assessment, comparative interpretation, and downstream experimental design.

Targeted Retrieval
Search M. leprae proteins by UniProt entry, ML locus tag, gene name, protein name, ligand, or full and partial amino-acid sequence.
Scientifically Grounded Summaries
Protein-centric pages assemble concise but biologically meaningful summaries spanning annotation, structure, localisation, sequence features, and linked evidence relevant to leprosy research.
Leprosy-Relevant Annotation
GO terms, pathway context, sequence features, domain architecture, and supporting metadata remain immediately usable for interpretation of M. leprae biology and disease relevance.
Structure-Enabled Interpretation
Monomeric and oligomeric models, curated sequences, and browseable tables are organised for downstream analysis in target assessment, comparative genomics, and diagnostic assay development.

HANSEN Scope


Use HANSEN to interrogate M. leprae proteins linked to metabolism, cell-envelope biology, host interaction, persistence, biomarker discovery, and structure-guided target prioritisation across the leprosy bacillus proteome.

3. Query Types

Query type What to enter What HANSEN does Best use
UniProt UniProt accession or entry identifier Opens the matching protein page. When you know the canonical UniProt entry.
ML locus tag Example: ML0005 Finds the protein by M. leprae locus tag. Best for genome/proteome workflows.
Gene name Gene symbol or partial gene text Uses exact, prefix and partial matching. When working from annotation tables or literature.
Protein name Full or partial protein description Finds the closest matching annotated protein name. When you know the biological function but not the locus tag.
Sequence Full or partial amino-acid sequence Normalises the sequence and searches exact, containing and local matches. Useful for fragments, peptides, or copied FASTA sequence.
Ligand Ligand code, name, SMILES-derived annotation, or PubChem CID Returns proteins linked to that ligand through imported oligomer/template ligand annotations. Useful for identifying ligand-associated targets or cofactor-binding proteins.

5. Protein Result Page

A protein result page is the central detailed view for one HANSEN entry. It usually begins with a summary panel containing the UniProt entry, ML locus tag, gene name, protein name, organism, length and other core identifiers. Below this, the page is divided into functional, structural and evidence sections.

ASummary and Identifiers

Use this section to confirm that the correct protein was opened. It contains entry names, gene names, ML locus tags and links to external resources where available.

BFunctional Annotation

Review EC numbers, catalytic activity, cofactors, pathways, binding sites, active sites, protein family, domains, motifs and other curated annotations.

CEvidence and Cross-References

Check PubMed IDs, DOI IDs, InterPro, KEGG, GeneID, STRING and other cross-references to connect HANSEN annotations with external evidence.

DStructure and Target-Discovery Panels

Inspect model quality, Mol* structures, PAE maps, pockets, AF2Bind predictions, ligands and epitope predictions to prioritise proteins.

6. Structure Models

The model section lists available structural models for the selected protein. HANSEN can show monomeric and oligomeric models, including models from AF3, Boltz, Chai and Boltz2 depending on data availability.

Model type Meaning How to use it
Monomer Single-protein structural model. Best for domain architecture, local residues, active sites and simple confidence inspection.
Homomer Oligomer made from repeated copies of the same ML protein. Use to inspect biological assemblies, repeated interfaces and symmetry-related pockets.
Heteromer Complex containing different protein components. Use to inspect protein-protein interfaces, complex-specific pockets and chain-level behaviour.
Template-linked oligomer Assembly reconstructed or modelled using template evidence. Use template PDB/assembly metadata to interpret biological relevance.
Interpretation note: a model is a prediction unless experimental structural evidence is explicitly indicated. Use pLDDT, PAE, pocket evidence and biological annotation together before prioritising a target.

7. Mol* Viewer

The Mol* panel is the interactive 3D structure viewer. Use it to rotate, zoom, inspect chains, focus residues, focus ligands, and view pocket or AF2Bind overlays when available.

  • Load/select a model: choose a structure model from the model controls. Oligomers may take longer to load than monomers.
  • Rotate and zoom: drag to rotate, scroll to zoom, and right-click/secondary-drag to pan depending on your device.
  • Use full-page mode: open the larger Mol* view when detailed inspection is needed.
  • Focus features: use ligand, pocket, AF2Bind or epitope controls to focus the relevant region in 3D.
  • Return to the page: exit full-page mode to continue reviewing tables and annotation cards.

8. pLDDT and PAE Confidence Interpretation

Confidence panels help distinguish reliable local structural regions from lower-confidence or flexible areas. pLDDT is residue-level confidence, while PAE helps interpret the reliability of domain-domain or chain-chain placement.

pLDDT

Use pLDDT to judge local residue confidence. High-confidence residues are more reliable for local pocket, active-site and epitope interpretation. Low-confidence regions may be flexible, disordered or modelled with uncertainty.

PAE

Use PAE to judge relative placement of domains and chains. For oligomers, inspect block patterns across chains to understand interface confidence and assembly reliability.

For large oligomers, load the structure first and then load the PAE map only when needed. PAE files can be large and may slow down initial page rendering.

8. Pockets, Hotspots and AF2Bind

HANSEN includes predicted pockets and AF2Bind-style binding-site predictions where available. These panels help prioritise proteins and residues for druggability assessment.

  • Pocket table: lists predicted pockets, scores, tools and residue information where available.
  • Focus pocket: centres Mol* on the selected pocket and overlays the pocket region if residue-level data are available.
  • AF2Bind: highlights predicted binding residues and can be used alongside pocket predictions.
  • Remove/hide overlays: clear surfaces after inspection to return to a clean structure view.

For target prioritisation, favour proteins where predicted pockets overlap with high-confidence structural regions, biologically relevant ligands, catalytic sites, conserved residues or essential functional annotations.

10. Ligands

Ligand annotations help connect modelled oligomers and template assemblies with cofactors, ions, substrates or other small molecules. Ligands may appear as CCD-style symbols, names, SMILES-derived entries or PubChem-linked compounds.

  • Search by code: examples include short ligand symbols such as ZN or template ligand codes.
  • Search by name: enter a ligand or compound name if the imported annotation includes it.
  • Search by PubChem CID: use formats such as CID 32051 when PubChem resolution is available.
  • Protein page ligand table: use ligand rows to focus the ligand in Mol* and open PubChem where linked.
If a reconstructed oligomer ligand is missing, import the reconstructed YAML ligands into the database table used by HANSEN. The web search will only see ligands that are present in the ligand annotation source currently used by the backend.

10. B-cell Epitope Propensity

The B-cell epitope section shows residue-level or region-level epitope propensity where available. Use it to identify exposed, structurally plausible antigenic regions, especially when combined with pLDDT, accessibility and 3D localisation.

  • Review the epitope table for predicted high-scoring residues or regions.
  • Focus predicted residues in Mol* to evaluate their structural context.
  • Compare epitope propensity with pockets, ligands and functional domains when prioritising diagnostic or immunological targets.

12. GO Browser

The GO Browser provides ontology-guided exploration. It uses a sunburst-style visualisation and associated protein tables to help move from broad ontology categories to specific protein sets.

  • Hover: inspect a GO branch and see its context.
  • Mouse wheel: dive inward or move back outward through GO hierarchy levels.
  • Click: lock the selected branch or term.
  • Review matched proteins: use the table below the sunburst to open individual protein pages.
  • Use namespace filters: compare biological process, molecular function and cellular component annotations.

GO browsing is useful when you do not know a specific protein target but want to explore all proteins annotated with a functional process, molecular activity or localisation.

13. Statistics Dashboard

The Statistics page gives a database-wide summary of structural coverage, model confidence, assembly types, oligomeric states, ligand codes, pocket-score classes and epitope evidence.

Dashboard area What it tells you How to use it
Entry and annotation counts How many proteins have core annotation, GO terms, EC numbers or 3D information. Assess annotation completeness.
Model coverage How many proteins have monomer, homomer or heteromer models. Identify modelling gaps and completed coverage.
pLDDT and PAE classes Confidence distribution across models. Prioritise high-confidence models for downstream interpretation.
Ligands Most frequent ligand annotations and linked proteins. Find cofactor-associated or ligand-template-associated proteins.
Pockets and epitopes Distribution of pocket predictions, pocket scores and epitope evidence. Shortlist targets for druggability or diagnostic antigen exploration.

14. Example Workflows

  1. Go to Query, or try a direct example such as ML0005, gyrA, or ZN ligand search.
  2. Enter an ML locus tag, for example ML0005.
  3. Open the protein page and confirm the summary identifiers.
  4. Inspect model scores and select the most relevant monomer or oligomer.
  5. Use Mol* to inspect residues, ligands, pockets and chain interfaces.
  6. Load PAE if you need domain or chain-placement confidence.

  1. Use the Ligand query on the home page, or open an example such as ZN.
  2. Enter a ligand symbol, name or PubChem CID.
  3. Review the list of proteins linked to that ligand.
  4. Open each protein and inspect the ligand table and Mol* focus action.
  5. Combine ligand evidence with pockets, confidence and functional annotation.

  1. Open the GO Browser.
  2. Select a namespace or navigate through the sunburst.
  3. Click a GO branch or term to lock the protein set.
  4. Open candidate proteins from the table.
  5. Inspect structures, pockets, ligands and epitope predictions for each candidate.

  1. Start with Statistics to identify proteins with models, pockets and ligand annotations.
  2. Open proteins with high-confidence structures and biologically relevant assemblies.
  3. Prioritise proteins with strong pockets, AF2Bind residues, ligands or conserved functional sites.
  4. Check PAE for domain/interface reliability before interpreting oligomeric pockets.
  5. Use external links and annotations to assess biological relevance.

15. Troubleshooting and Interpretation Tips

Issue Likely reason What to do
Search gives no protein The query may not match the stored identifier or annotation text. Try UniProt, ML locus tag, gene name and partial protein name separately.
Sequence search gives no result The fragment may be too short, absent, or from a different strain/annotation set. Use a longer sequence fragment and remove spaces, numbers and FASTA headers.
Oligomer loads slowly Large CIF/BCIF and PAE files can be heavy. Load the structure first; load PAE only when needed.
PAE is missing The PAE JSON may not exist for that selected model or may not be mapped. Use another model or check whether the corresponding PAE file has been staged.
Pocket focus works but no surface appears Residue lists may be unavailable for that pocket source/model. Use available residue-level pocket outputs or rebuild pocket summaries with residues.
Ligand search misses expected ligands Reconstructed YAML ligands may not yet be imported into the ligand table. Run the reconstructed ligand importer and then restart or refresh HANSEN.
Recommended interpretation: do not prioritise a protein based on one signal alone. Combine functional annotation, biological relevance, model confidence, pocket evidence, ligand evidence, oligomeric context and experimental feasibility.

Target Prioritization ?

The Target Prioritization page ranks Mycobacterium leprae proteins by integrating ProteomeLM-derived proteome-context information with structural, functional and druggability evidence.

Open Target Prioritization

What the Page Shows

The table provides an evidence-weighted shortlist of candidate proteins for target discovery. Each row corresponds to an ML locus/protein and includes a Target Priority Score, priority tier, ProteomeLM contextual signal, pocket/AF2Bind evidence, model-quality evidence, annotation support and a short rationale explaining why the protein was ranked.

How the ProteomeLM Signal Was Generated

Boltz2 monomeric CIF models were used to define the protein set. Protein sequences were matched to HANSEN database records where available and converted into ESM-C embeddings. These embeddings were then passed through ProteomeLM in whole-proteome context. This allows each protein to be interpreted relative to the rest of the M. leprae proteome rather than as an isolated sequence.

Important interpretation: the current Target Prioritization table is not a validated essentiality-probability table. It is an AI-assisted contextual prioritization layer. A true essentiality probability would require a trained ProteomeLM-Ess classifier head or a separately trained supervised essentiality model with validated essential/non-essential labels.

Evidence Layers Used

Evidence Layer Role in Prioritization
ProteomeLM contextual signal AI-derived whole-proteome contextual evidence from ESM-C embeddings and ProteomeLM inference.
Boltz2 model quality Supports confidence in structure-based interpretation, including pockets and residue-level signals.
Pocket evidence Captures predicted cavities and potential small-molecule binding sites.
AF2Bind evidence Summarises residue-level binding propensity and high-propensity binding regions.
Functional annotation Uses EC, GO, pathway, active-site, binding-site, cofactor and protein-family information.
Tractability features Considers properties such as protein size, enzymatic function and structural druggability support.

How the Target Priority Score Is Calculated

The final score is a composite 0–100 score combining AI-derived context, druggability, annotation, model quality and tractability:

Target Priority Score = 100 × ( 0.35 × ProteomeLM contextual score + 0.25 × pocket/AF2Bind binding-site score + 0.20 × annotation score + 0.10 × Boltz2 model-quality score + 0.10 × tractability score )

Priority Tiers

How to Use the Ranking

The table should be used as a discovery shortlist rather than a final decision. The most compelling targets are those with a high Target Priority Score, strong ProteomeLM contextual signal, confident Boltz2 model, clear pocket or AF2Bind evidence, relevant pathway or enzyme annotation and biological plausibility.

Current Limitation

The current implementation does not yet provide experimentally validated essentiality probabilities. To achieve full essentiality prediction, HANSEN would need either an official trained ProteomeLM-Ess head or a supervised HANSEN essentiality classifier trained on curated essential and non-essential labels from mycobacteria and other relevant datasets, with careful orthology mapping to M. leprae.

HANSEN Essentiality Prediction

This section explains how the HANSEN essentiality-evidence classes are calculated and how they should be interpreted.

Open Essentiality Page

Overview

HANSEN essentiality prediction is a conservative evidence-ranking framework for Mycobacterium leprae. It combines transfer learning from Mycobacterium tuberculosis, reciprocal-best-hit orthology, ProteomeLM whole-proteome context and HANSEN target-priority evidence.

The output should be interpreted as essentiality evidence, not as native experimental M. leprae knockout evidence. This is important because M. leprae has a reduced genome and its biology may differ from Mtb.

Training Labels

The Mtb transfer model is trained using the M. tuberculosis H37Rv genome-wide essentiality calls from DeJesus et al. (PMID: 28096490). In the conservative training setup, ES and GD calls are treated as essential-like positives, while NE and GA calls are treated as non-essential-like negatives. Ambiguous categories are not treated as hard negatives.

ESM-2 Transfer Classifier

Mtb protein sequences are converted into ESM-2 protein-language-model embeddings. A calibrated classifier is then trained to distinguish essential-like from non-essential-like Mtb genes. The trained model is applied to M. leprae proteins to generate an ESM-2 transfer probability.

The ESM-2 probability is therefore a cross-species transfer-learning prior. It is useful for prioritisation, but it should not be interpreted as direct experimental essentiality in M. leprae.

Mtb Reciprocal-best-hit Anchors

HANSEN uses DIAMOND reciprocal-best-hit mapping to identify confident Mtb ortholog anchors for each M. leprae protein where possible. If a protein has a reciprocal-best-hit Mtb ortholog, the corresponding DeJesus et al. (PMID: 28096490) essentiality call is used as supporting biological evidence.

ProteomeLM Context

HANSEN also incorporates the ProteomeLM contextual percentile generated from whole-proteome ProteomeLM inference. This captures how strongly a protein behaves as a distinctive contextual element within the M. leprae proteome.

Target-priority Evidence

The HANSEN Target Priority Score is used as supporting evidence. This score integrates structural and functional information such as predicted pockets, AF2Bind binding propensity, Boltz2 model quality, annotation, pathway evidence and tractability.

Integrated HANSEN Essentiality Score

The HANSEN integrated score combines the ESM-2 transfer probability with Mtb anchor evidence, ProteomeLM context and HANSEN target-priority evidence.

If an Mtb anchor is available:
HANSEN score = 0.60 × ESM-2 probability + 0.25 × Mtb anchor + 0.10 × ProteomeLM context + 0.05 × Target-priority percentile
If no Mtb anchor is available:
HANSEN score = 0.78 × ESM-2 probability + 0.17 × ProteomeLM context + 0.05 × Target-priority percentile

The Mtb anchor is encoded as 1 for essential-like and 0 for non-essential-like. When no anchor is available, its weight is redistributed toward the ESM-2 probability and ProteomeLM context.

Conservative Evidence Classes

Class Interpretation
Likely essential Strong convergent evidence, such as an essential Mtb reciprocal-best-hit anchor, high HANSEN integrated score or high ESM-2 transfer probability.
Possible essential / review Moderate model evidence, high ProteomeLM context, high target-priority evidence or conflicting evidence. These proteins should remain under manual review and should not be excluded from target discovery.
Uncertain Insufficient convergent evidence to confidently assign likely-essential or lower-evidence status.
Lower essentiality evidence Consistently weak model, ProteomeLM, target-priority and anchor evidence. This does not mean experimentally non-essential in M. leprae.

Recommended Use

Use the Essentiality page as a triage tool. For drug discovery, the strongest candidates are proteins that are Likely essential or Possible essential / review and also have strong HANSEN target-priority evidence, good model confidence and convincing pocket or AF2Bind support.

Important: “Lower essentiality evidence” should not be interpreted as an experimentally proven non-essential call. It means the current HANSEN transfer-learning and supporting evidence layers do not strongly support essentiality.

Contact and Acknowledgements

HANSEN was developed by Dr Sundeep Chaitanya Vedithi in collaboration with the Department of Biochemistry and Department of Medicine, University of Cambridge, and the Science and Technology Facilities Council (STFC).

This work was supported by Hope Rises International, formerly American Leprosy Missions. For enquiries, please visit the Contact page.