HANSEN • Help and Navigation Guide

1. What HANSEN Contains

HANSEN is an integrated structural and functional resource for the Mycobacterium leprae proteome. It brings together core protein identifiers, curated functional annotations, sequence features, cross-references, homology and de novo structure models, oligomer assemblies, ligand annotations, pocket predictions, AF2Bind binding-site predictions, PAE confidence maps and B-cell epitope propensity information.

UniProt and ML locus identifiers Gene and protein annotation AF3 / Boltz / Chai / Boltz2 models Monomers and oligomers Mol* structure viewer pLDDT and PAE Pockets and AF2Bind Ligands and PubChem links GO browser Database statistics

2. Home Page and Search

The home page is the main entry point. Use the query panel to search by identifiers, names, ligands or sequence. The top navigation provides direct access to the query area, database background, statistics dashboard and GO browser.

Use the Query Panel When You Know a Target

Search using a UniProt accession, ML locus tag, gene name, protein name, ligand symbol/name/PubChem CID, or a protein sequence fragment.

Use Global Pages When You are Exploring

Open the Statistics dashboard for database-wide coverage or the GO Browser to discover groups of proteins by ontology.

About the Resource

About HANSEN and Database Scope

This section has been moved from the home page so that the home page remains focused on search, while the Help page carries the full explanatory guide.

About HANSEN

HANSEN is designed to support translational and mechanistic leprosy research by making M. leprae protein-centric information easy to search, inspect, and reuse. The platform links identifiers, functional evidence, sequence features, and structural models in a form suitable for target triage, biomarker assessment, comparative interpretation, and downstream experimental design.

Targeted Retrieval

Search M. leprae proteins by UniProt entry, ML locus tag, gene name, protein name, ligand, or full and partial amino-acid sequence.

Scientifically Grounded Summaries

Protein-centric pages assemble concise but biologically meaningful summaries spanning annotation, structure, localisation, sequence features, and linked evidence relevant to leprosy research.

Leprosy-Relevant Annotation

GO terms, pathway context, sequence features, domain architecture, and supporting metadata remain immediately usable for interpretation of M. leprae biology and disease relevance.

Structure-Enabled Interpretation

Monomeric and oligomeric models, curated sequences, and browseable tables are organised for downstream analysis in target assessment, comparative genomics, and diagnostic assay development.

HANSEN Scope

Query M. leprae Entries Browse M. leprae GO terms Inspect Database Statistics Home

Use HANSEN to interrogate M. leprae proteins linked to metabolism, cell-envelope biology, host interaction, persistence, biomarker discovery, and structure-guided target prioritisation across the leprosy bacillus proteome.

Practical Example Links

Use these examples to understand how HANSEN links database-wide views to protein-level interpretation. They are designed as quick entry points for new users.

Start Here Open the Main Query Panel Search by UniProt ID, ML locus tag, gene, protein name, ligand or sequence. Protein Example Open ML0005 Directly Example ML-locus route for a detailed protein page with annotations and model panels. Gene Example Search by Gene: gyrA Demonstrates gene-name searching and redirection to the best matching protein page. Protein-Name Example Search DNA Gyrase Subunit A Useful when a user knows the protein function but not the UniProt or ML identifier. Ligand Example Find Proteins Linked to ZN Shows how ligand-linked protein discovery works when ligand annotations are imported. Database Overview Open the Statistics Dashboard Inspect model coverage, ligands, pockets, PAE, pLDDT and epitope summary views. Ontology Exploration Open the GO Browser Navigate biological process, molecular function and cellular component annotations. Training Workflow Jump to Example Workflows Follow practical routes for protein inspection, ligand search and target prioritisation.

3. Query Types

Query type	What to enter	What HANSEN does	Best use
UniProt	UniProt accession or entry identifier	Opens the matching protein page.	When you know the canonical UniProt entry.
ML locus tag	Example: `ML0005`	Finds the protein by M. leprae locus tag.	Best for genome/proteome workflows.
Gene name	Gene symbol or partial gene text	Uses exact, prefix and partial matching.	When working from annotation tables or literature.
Protein name	Full or partial protein description	Finds the closest matching annotated protein name.	When you know the biological function but not the locus tag.
Sequence	Full or partial amino-acid sequence	Normalises the sequence and searches exact, containing and local matches.	Useful for fragments, peptides, or copied FASTA sequence.
Ligand	Ligand code, name, SMILES-derived annotation, or PubChem CID	Returns proteins linked to that ligand through imported oligomer/template ligand annotations.	Useful for identifying ligand-associated targets or cofactor-binding proteins.

5. Protein Result Page

A protein result page is the central detailed view for one HANSEN entry. It usually begins with a summary panel containing the UniProt entry, ML locus tag, gene name, protein name, organism, length and other core identifiers. Below this, the page is divided into functional, structural and evidence sections.

ASummary and Identifiers

Use this section to confirm that the correct protein was opened. It contains entry names, gene names, ML locus tags and links to external resources where available.

BFunctional Annotation

Review EC numbers, catalytic activity, cofactors, pathways, binding sites, active sites, protein family, domains, motifs and other curated annotations.

CEvidence and Cross-References

Check PubMed IDs, DOI IDs, InterPro, KEGG, GeneID, STRING and other cross-references to connect HANSEN annotations with external evidence.

DStructure and Target-Discovery Panels

Inspect model quality, Mol* structures, PAE maps, pockets, AF2Bind predictions, ligands and epitope predictions to prioritise proteins.

6. Structure Models

The model section lists available structural models for the selected protein. HANSEN can show monomeric and oligomeric models, including models from AF3, Boltz, Chai and Boltz2 depending on data availability.

Model type	Meaning	How to use it
Monomer	Single-protein structural model.	Best for domain architecture, local residues, active sites and simple confidence inspection.
Homomer	Oligomer made from repeated copies of the same ML protein.	Use to inspect biological assemblies, repeated interfaces and symmetry-related pockets.
Heteromer	Complex containing different protein components.	Use to inspect protein-protein interfaces, complex-specific pockets and chain-level behaviour.
Template-linked oligomer	Assembly reconstructed or modelled using template evidence.	Use template PDB/assembly metadata to interpret biological relevance.

Interpretation note: a model is a prediction unless experimental structural evidence is explicitly indicated. Use pLDDT, PAE, pocket evidence and biological annotation together before prioritising a target.

7. Mol* Viewer

The Mol* panel is the interactive 3D structure viewer. Use it to rotate, zoom, inspect chains, focus residues, focus ligands, and view pocket or AF2Bind overlays when available.

Load/select a model: choose a structure model from the model controls. Oligomers may take longer to load than monomers.
Rotate and zoom: drag to rotate, scroll to zoom, and right-click/secondary-drag to pan depending on your device.
Use full-page mode: open the larger Mol* view when detailed inspection is needed.
Focus features: use ligand, pocket, AF2Bind or epitope controls to focus the relevant region in 3D.
Return to the page: exit full-page mode to continue reviewing tables and annotation cards.

8. pLDDT and PAE Confidence Interpretation

Confidence panels help distinguish reliable local structural regions from lower-confidence or flexible areas. pLDDT is residue-level confidence, while PAE helps interpret the reliability of domain-domain or chain-chain placement.

pLDDT

Use pLDDT to judge local residue confidence. High-confidence residues are more reliable for local pocket, active-site and epitope interpretation. Low-confidence regions may be flexible, disordered or modelled with uncertainty.

PAE

Use PAE to judge relative placement of domains and chains. For oligomers, inspect block patterns across chains to understand interface confidence and assembly reliability.

For large oligomers, load the structure first and then load the PAE map only when needed. PAE files can be large and may slow down initial page rendering.

8. Pockets, Hotspots and AF2Bind

HANSEN includes predicted pockets and AF2Bind-style binding-site predictions where available. These panels help prioritise proteins and residues for druggability assessment.

Pocket table: lists predicted pockets, scores, tools and residue information where available.
Focus pocket: centres Mol* on the selected pocket and overlays the pocket region if residue-level data are available.
AF2Bind: highlights predicted binding residues and can be used alongside pocket predictions.
Remove/hide overlays: clear surfaces after inspection to return to a clean structure view.

For target prioritisation, favour proteins where predicted pockets overlap with high-confidence structural regions, biologically relevant ligands, catalytic sites, conserved residues or essential functional annotations.

10. Ligands

Ligand annotations help connect modelled oligomers and template assemblies with cofactors, ions, substrates or other small molecules. Ligands may appear as CCD-style symbols, names, SMILES-derived entries or PubChem-linked compounds.

Search by code: examples include short ligand symbols such as ZN or template ligand codes.
Search by name: enter a ligand or compound name if the imported annotation includes it.
Search by PubChem CID: use formats such as CID 32051 when PubChem resolution is available.
Protein page ligand table: use ligand rows to focus the ligand in Mol* and open PubChem where linked.

If a reconstructed oligomer ligand is missing, import the reconstructed YAML ligands into the database table used by HANSEN. The web search will only see ligands that are present in the ligand annotation source currently used by the backend.

10. B-cell Epitope Propensity

The B-cell epitope section shows residue-level or region-level epitope propensity where available. Use it to identify exposed, structurally plausible antigenic regions, especially when combined with pLDDT, accessibility and 3D localisation.

Review the epitope table for predicted high-scoring residues or regions.
Focus predicted residues in Mol* to evaluate their structural context.
Compare epitope propensity with pockets, ligands and functional domains when prioritising diagnostic or immunological targets.

12. GO Browser

The GO Browser provides ontology-guided exploration. It uses a sunburst-style visualisation and associated protein tables to help move from broad ontology categories to specific protein sets.

Hover: inspect a GO branch and see its context.
Mouse wheel: dive inward or move back outward through GO hierarchy levels.
Click: lock the selected branch or term.
Review matched proteins: use the table below the sunburst to open individual protein pages.
Use namespace filters: compare biological process, molecular function and cellular component annotations.

GO browsing is useful when you do not know a specific protein target but want to explore all proteins annotated with a functional process, molecular activity or localisation.

13. Statistics Dashboard

The Statistics page gives a database-wide summary of structural coverage, model confidence, assembly types, oligomeric states, ligand codes, pocket-score classes and epitope evidence.

Dashboard area	What it tells you	How to use it
Entry and annotation counts	How many proteins have core annotation, GO terms, EC numbers or 3D information.	Assess annotation completeness.
Model coverage	How many proteins have monomer, homomer or heteromer models.	Identify modelling gaps and completed coverage.
pLDDT and PAE classes	Confidence distribution across models.	Prioritise high-confidence models for downstream interpretation.
Ligands	Most frequent ligand annotations and linked proteins.	Find cofactor-associated or ligand-template-associated proteins.
Pockets and epitopes	Distribution of pocket predictions, pocket scores and epitope evidence.	Shortlist targets for druggability or diagnostic antigen exploration.

14. Example Workflows

Go to Query, or try a direct example such as ML0005, gyrA, or ZN ligand search.
Enter an ML locus tag, for example ML0005.
Open the protein page and confirm the summary identifiers.
Inspect model scores and select the most relevant monomer or oligomer.
Use Mol* to inspect residues, ligands, pockets and chain interfaces.
Load PAE if you need domain or chain-placement confidence.

Use the Ligand query on the home page, or open an example such as ZN.
Enter a ligand symbol, name or PubChem CID.
Review the list of proteins linked to that ligand.
Open each protein and inspect the ligand table and Mol* focus action.
Combine ligand evidence with pockets, confidence and functional annotation.

Open the GO Browser.
Select a namespace or navigate through the sunburst.
Click a GO branch or term to lock the protein set.
Open candidate proteins from the table.
Inspect structures, pockets, ligands and epitope predictions for each candidate.

Start with Statistics to identify proteins with models, pockets and ligand annotations.
Open proteins with high-confidence structures and biologically relevant assemblies.
Prioritise proteins with strong pockets, AF2Bind residues, ligands or conserved functional sites.
Check PAE for domain/interface reliability before interpreting oligomeric pockets.
Use external links and annotations to assess biological relevance.

15. Troubleshooting and Interpretation Tips

Issue	Likely reason	What to do
Search gives no protein	The query may not match the stored identifier or annotation text.	Try UniProt, ML locus tag, gene name and partial protein name separately.
Sequence search gives no result	The fragment may be too short, absent, or from a different strain/annotation set.	Use a longer sequence fragment and remove spaces, numbers and FASTA headers.
Oligomer loads slowly	Large CIF/BCIF and PAE files can be heavy.	Load the structure first; load PAE only when needed.
PAE is missing	The PAE JSON may not exist for that selected model or may not be mapped.	Use another model or check whether the corresponding PAE file has been staged.
Pocket focus works but no surface appears	Residue lists may be unavailable for that pocket source/model.	Use available residue-level pocket outputs or rebuild pocket summaries with residues.
Ligand search misses expected ligands	Reconstructed YAML ligands may not yet be imported into the ligand table.	Run the reconstructed ligand importer and then restart or refresh HANSEN.

Recommended interpretation: do not prioritise a protein based on one signal alone. Combine functional annotation, biological relevance, model confidence, pocket evidence, ligand evidence, oligomeric context and experimental feasibility.

Target Prioritization ?

The Target Prioritization page ranks Mycobacterium leprae proteins by integrating ProteomeLM-derived proteome-context information with structural, functional and druggability evidence.

Open Target Prioritization

What the Page Shows

The table provides an evidence-weighted shortlist of candidate proteins for target discovery. Each row corresponds to an ML locus/protein and includes a Target Priority Score, priority tier, ProteomeLM contextual signal, pocket/AF2Bind evidence, model-quality evidence, annotation support and a short rationale explaining why the protein was ranked.

How the ProteomeLM Signal Was Generated

Boltz2 monomeric CIF models were used to define the protein set. Protein sequences were matched to HANSEN database records where available and converted into ESM-C embeddings. These embeddings were then passed through ProteomeLM in whole-proteome context. This allows each protein to be interpreted relative to the rest of the M. leprae proteome rather than as an isolated sequence.

Important interpretation: the current Target Prioritization table is not a validated essentiality-probability table. It is an AI-assisted contextual prioritization layer. A true essentiality probability would require a trained ProteomeLM-Ess classifier head or a separately trained supervised essentiality model with validated essential/non-essential labels.

Evidence Layers Used

Evidence Layer	Role in Prioritization
ProteomeLM contextual signal	AI-derived whole-proteome contextual evidence from ESM-C embeddings and ProteomeLM inference.
Boltz2 model quality	Supports confidence in structure-based interpretation, including pockets and residue-level signals.
Pocket evidence	Captures predicted cavities and potential small-molecule binding sites.
AF2Bind evidence	Summarises residue-level binding propensity and high-propensity binding regions.
Functional annotation	Uses EC, GO, pathway, active-site, binding-site, cofactor and protein-family information.
Tractability features	Considers properties such as protein size, enzymatic function and structural druggability support.

How the Target Priority Score Is Calculated

The final score is a composite 0–100 score combining AI-derived context, druggability, annotation, model quality and tractability:


      Target Priority Score =
      100 × (
      0.35 × ProteomeLM contextual score +
      0.25 × pocket/AF2Bind binding-site score +
      0.20 × annotation score +
      0.10 × Boltz2 model-quality score +
      0.10 × tractability score
      )

Priority Tiers

High-priority: strongest combined evidence; suitable for immediate manual inspection.
Strong candidate: good combined evidence; suitable for target shortlisting and review.
Moderate candidate: some supportive evidence, but with missing or weaker components.
Exploratory: lower current evidence; useful for hypothesis generation or later re-analysis.

How to Use the Ranking

The table should be used as a discovery shortlist rather than a final decision. The most compelling targets are those with a high Target Priority Score, strong ProteomeLM contextual signal, confident Boltz2 model, clear pocket or AF2Bind evidence, relevant pathway or enzyme annotation and biological plausibility.

Current Limitation

The current implementation does not yet provide experimentally validated essentiality probabilities. To achieve full essentiality prediction, HANSEN would need either an official trained ProteomeLM-Ess head or a supervised HANSEN essentiality classifier trained on curated essential and non-essential labels from mycobacteria and other relevant datasets, with careful orthology mapping to M. leprae.

HANSEN Essentiality Prediction

This section explains how the HANSEN essentiality-evidence classes are calculated and how they should be interpreted.

Open Essentiality Page

Overview

HANSEN essentiality prediction is a conservative evidence-ranking framework for Mycobacterium leprae. It combines transfer learning from Mycobacterium tuberculosis, reciprocal-best-hit orthology, ProteomeLM whole-proteome context and HANSEN target-priority evidence.

The output should be interpreted as essentiality evidence, not as native experimental M. leprae knockout evidence. This is important because M. leprae has a reduced genome and its biology may differ from Mtb.

Training Labels

The Mtb transfer model is trained using the M. tuberculosis H37Rv genome-wide essentiality calls from DeJesus et al. (PMID: 28096490). In the conservative training setup, ES and GD calls are treated as essential-like positives, while NE and GA calls are treated as non-essential-like negatives. Ambiguous categories are not treated as hard negatives.

ESM-2 Transfer Classifier

Mtb protein sequences are converted into ESM-2 protein-language-model embeddings. A calibrated classifier is then trained to distinguish essential-like from non-essential-like Mtb genes. The trained model is applied to M. leprae proteins to generate an ESM-2 transfer probability.

The ESM-2 probability is therefore a cross-species transfer-learning prior. It is useful for prioritisation, but it should not be interpreted as direct experimental essentiality in M. leprae.

Mtb Reciprocal-best-hit Anchors

HANSEN uses DIAMOND reciprocal-best-hit mapping to identify confident Mtb ortholog anchors for each M. leprae protein where possible. If a protein has a reciprocal-best-hit Mtb ortholog, the corresponding DeJesus et al. (PMID: 28096490) essentiality call is used as supporting biological evidence.

ProteomeLM Context

HANSEN also incorporates the ProteomeLM contextual percentile generated from whole-proteome ProteomeLM inference. This captures how strongly a protein behaves as a distinctive contextual element within the M. leprae proteome.

Target-priority Evidence

The HANSEN Target Priority Score is used as supporting evidence. This score integrates structural and functional information such as predicted pockets, AF2Bind binding propensity, Boltz2 model quality, annotation, pathway evidence and tractability.

Integrated HANSEN Essentiality Score

The HANSEN integrated score combines the ESM-2 transfer probability with Mtb anchor evidence, ProteomeLM context and HANSEN target-priority evidence.


      If an Mtb anchor is available:

      HANSEN score = 0.60 × ESM-2 probability + 0.25 × Mtb anchor + 0.10 × ProteomeLM context + 0.05 × Target-priority percentile


      If no Mtb anchor is available:

      HANSEN score = 0.78 × ESM-2 probability + 0.17 × ProteomeLM context + 0.05 × Target-priority percentile

The Mtb anchor is encoded as 1 for essential-like and 0 for non-essential-like. When no anchor is available, its weight is redistributed toward the ESM-2 probability and ProteomeLM context.

Conservative Evidence Classes

Class	Interpretation
Likely essential	Strong convergent evidence, such as an essential Mtb reciprocal-best-hit anchor, high HANSEN integrated score or high ESM-2 transfer probability.
Possible essential / review	Moderate model evidence, high ProteomeLM context, high target-priority evidence or conflicting evidence. These proteins should remain under manual review and should not be excluded from target discovery.
Uncertain	Insufficient convergent evidence to confidently assign likely-essential or lower-evidence status.
Lower essentiality evidence	Consistently weak model, ProteomeLM, target-priority and anchor evidence. This does not mean experimentally non-essential in M. leprae.

Recommended Use

Use the Essentiality page as a triage tool. For drug discovery, the strongest candidates are proteins that are Likely essential or Possible essential / review and also have strong HANSEN target-priority evidence, good model confidence and convincing pocket or AF2Bind support.

Important: “Lower essentiality evidence” should not be interpreted as an experimentally proven non-essential call. It means the current HANSEN transfer-learning and supporting evidence layers do not strongly support essentiality.

Area	Tool / Resource	Link	Use Notes
Structure modelling	AlphaFold 3	Repository Output Terms Model Parameters Terms	Non-commercial use only; AlphaFold 3 parameters must not be redistributed through HANSEN.
Structure modelling	Boltz-1 / Boltz-2	Boltz Repository	MIT-licensed code and weights according to the upstream repository; cite Boltz reports when used.
Structure modelling	Chai-1	Chai-lab Repository	Apache 2.0 according to the upstream repository; cite Chai-1 technical report when used.
Binding-pocket prediction	AF2Bind / AlphaFold 2-derived workflows	ColabDesign AlphaFold DB	Verify the exact AF2Bind repository and AlphaFold 2 / ColabDesign terms before redistribution or commercial use.
Binding-pocket prediction	P2Rank	P2Rank Repository	MIT-licensed according to the upstream repository; cite Krivák and Hoksza.
Binding-pocket prediction	fpocket	fpocket Repository	MIT-licensed according to the upstream repository; cite Le Guilloux, Schmidtke and Tufféry.
Epitope mapping	DiscoTope-3.0	DTU DiscoTope-3.0 Server Academic Download / Licence	Academic / non-commercial use; commercial use requires a separate DTU licence.
Target prioritisation	ProteomeLM / ProteomeLM-Ess	ProteomeLM Repository	Check ProteomeLM and ESM-C terms before redistribution or commercial use.
Gene essentiality	ESM-2	ESM Repository	MIT-licensed according to Meta AI ESM repository; cite Lin et al.
Gene essentiality	DIAMOND	DIAMOND Repository	GPL-3.0. Keep as an external dependency unless GPL obligations are fully addressed.
Machine learning	scikit-learn	scikit-learn Citation	BSD-3-Clause; cite scikit-learn where used.

1. What HANSEN Contains

2. Home Page and Search

Use the Query Panel When You Know a Target

Use Global Pages When You are Exploring

About HANSEN and Database Scope

About HANSEN

HANSEN Scope

Practical Example Links

3. Query Types

5. Protein Result Page

ASummary and Identifiers

BFunctional Annotation

CEvidence and Cross-References

DStructure and Target-Discovery Panels

6. Structure Models

7. Mol* Viewer

8. pLDDT and PAE Confidence Interpretation

pLDDT

PAE

8. Pockets, Hotspots and AF2Bind

10. Ligands

10. B-cell Epitope Propensity

12. GO Browser

13. Statistics Dashboard

14. Example Workflows

Workflow 1: Open a Known ML Protein and Inspect Its Structure

Workflow 2: Find Proteins Associated with a Ligand

Workflow 3: Explore a Functional Class Through GO

Workflow 4: Shortlist Drug-Discovery Candidates

15. Troubleshooting and Interpretation Tips

Target Prioritization ?

What the Page Shows

How the ProteomeLM Signal Was Generated

Evidence Layers Used

How the Target Priority Score Is Calculated

Priority Tiers

How to Use the Ranking

Current Limitation

HANSEN Essentiality Prediction

Overview

Training Labels

ESM-2 Transfer Classifier

Mtb Reciprocal-best-hit Anchors

ProteomeLM Context

Target-priority Evidence

Integrated HANSEN Essentiality Score

Conservative Evidence Classes

Recommended Use

Contact and Acknowledgements