core

High-level Drug object that lazily resolves identifiers and fetches data.

class drugs.core.Drug(_pubchem_cid: int | None = None, _chembl_id: str | None = None, _inchikey: str | None = None, synonyms: ~typing.List[str] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)

Bases: object

Represent a drug and lazily translate between PubChem CID, ChEMBL ID, and InChIKey.

Network calls are performed on demand and results are cached on the instance to avoid repeated HTTP calls. The class is intentionally model-agnostic: callers can plug in any embedding function via text_embedding/protein_embedding.

synonyms: List[str]

metadata: Dict[str, Any]

property pubchem_cid: int | None

property chembl_id: str | None

property inchikey: str | None

fetch_pubchem_properties() → Dict[str, Any]

Retrieve and cache core PubChem properties.

Returns:: Dictionary of properties for the resolved PubChem CID.
Return type:: dict
Raises:: ValueError – If no resolvable PubChem CID is available.

fetch_pubchem_text(headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) → Dict[str, Any]

Fetch and cache selected PubChem PUG-View text sections.

Parameters:: headings (Iterable[str], optional) – Headings to request. Defaults to PUBCHEM_MINIMAL_STABLE.
Returns:: Mapping of heading -> section metadata and extracted strings.
Return type:: dict
Raises:: ValueError – If no resolvable PubChem CID is available.

fetch_chembl_mechanisms(limit: int = 50) → List[Dict[str, Any]]

Fetch mechanisms of action for the drug’s ChEMBL ID.

Parameters:: limit (int, default=50) – Maximum number of mechanism records to fetch.
Returns:: Mechanism entries from the ChEMBL API.
Return type:: list[dict]
Raises:: ValueError – If no resolvable ChEMBL ID is available.

fetch_chembl_bioactivities(*, min_pchembl: float = 5.0, assay_types: Iterable[str] = ('B', 'F'), limit: int = 1000) → List[Dict[str, Any]]: Fetch ChEMBL bioactivity rows filtered by potency and assay type.

fetch_target_details(target_chembl_id: str) → Dict[str, Any]

Fetch and cache target details for a ChEMBL target ID.

Parameters:: target_chembl_id (str) – ChEMBL target identifier.
Returns:: Target detail payload including components and synonyms.
Return type:: dict

fetch_drug_interactions(drug_name: str | None = None) → List[Dict[str, Any]]

Fetch drug-drug interactions from RxNav using a best-effort drug name.

Parameters:: drug_name (str, optional) – Override the drug name to query. When omitted, the function attempts to use IUPAC or synonym information from PubChem properties.
Returns:: Normalized interaction entries containing source, description, and interactants (list of partner drug names).
Return type:: list[dict]

target_accessions() → List[str]: Return UniProt accessions for all targets linked to the drug.

target_gene_symbols() → List[str]: Return gene symbols for all targets linked to the drug.

smiles() → str | None: Return canonical SMILES string resolved from PubChem properties.

selfies() → str | None: Convert the molecule to SELFIES representation if possible.

rdkit_mol() → Any: Return an RDKit molecule for the drug’s SMILES, caching the result.

molecular_fingerprint(*, method: str = 'morgan', n_bits: int = 2048, radius: int = 2, use_features: bool = False) → ndarray: Generate a molecular fingerprint for similarity calculations.

similarity_to(other: Drug, *, fingerprint_method: str = 'morgan', similarity_metric: str = 'tanimoto', n_bits: int = 2048, radius: int = 2, use_features: bool = False) → float: Compute structural similarity to another drug using fingerprints.

molecular_properties() → Dict[str, Any]: Compute RDKit-derived molecular property panel (QED, TPSA, Lipinski, SA).

text_corpus(headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) → str

Concatenate PubChem text snippets into a markdown-ish corpus.

Parameters:: headings (Iterable[str], optional) – Headings to include. Defaults to PUBCHEM_MINIMAL_STABLE.
Returns:: Formatted corpus with heading markers and snippets.
Return type:: str

text_embedding(embed_fn: Callable[[str], Any], headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) → Any

Compute a text embedding over the PubChem corpus.

Parameters:

embed_fn (Callable[[str], Any]) – User-provided embedding function accepting a text corpus.
headings (Iterable[str], optional) – Headings to include when building the corpus.

Returns:

Embedding output as returned by embed_fn.

Return type:

Any

esm_inputs() → List[str]: Return UniProt accessions to feed into a protein embedding model.

protein_embedding(embed_fn: Callable[[List[str]], Any]) → Any

Compute a protein embedding over target accessions.

Parameters:: embed_fn (Callable[[List[str]], Any]) – Embedding function that consumes a list of UniProt accessions.
Returns:: Embedding output as returned by embed_fn.
Return type:: Any

protein_embedding_cached(embed_fn: Callable[[List[str]], Any], *, path: Path | None = None, load_if_exists: bool = True, force: bool = False) → Any

Compute or load a cached protein embedding.

Parameters:

embed_fn (Callable[[List[str]], Any]) – Embedding function consuming accessions.
path (pathlib.Path, optional) – Custom output path. Defaults to an auto-generated path.
load_if_exists (bool, default=True) – Return the cached embedding if the file already exists.
force (bool, default=False) – Recompute even if the cache exists.

Returns:

Embedding output returned by embed_fn or loaded from disk.

Return type:

Any

text_embedding_cached(embed_fn: Callable[[str], Any], *, headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms'], path: Path | None = None, load_if_exists: bool = True, force: bool = False) → Any

Compute or load a cached text embedding.

Parameters:

embed_fn (Callable[[str], Any]) – Embedding function consuming a corpus string.
headings (Iterable[str], optional) – Headings to include when building the corpus.
path (pathlib.Path, optional) – Custom output path. Defaults to an auto-generated path.
load_if_exists (bool, default=True) – Return the cached embedding if the file already exists.
force (bool, default=False) – Recompute even if the cache exists.

Returns:

Embedding output returned by embed_fn or loaded from disk.

Return type:

Any

map_ids() → Dict[str, str | None]: Return a dictionary with the best-effort resolved identifiers.

classmethod from_pubchem_cid(cid: int) → Drug

classmethod from_chembl_id(chembl_id: str) → Drug

classmethod from_inchikey(inchikey: str) → Drug

classmethod from_batch(identifiers: List[str | int], *, prefetch_properties: bool = False, max_workers: int = 8) → List[Drug]: Create Drug instances from a batch of identifiers using parallel calls.

static batch_similarity_matrix(drugs: List[Drug], *, fingerprint_method: str = 'morgan', similarity_metric: str = 'tanimoto', n_bits: int = 2048, radius: int = 2, use_features: bool = False) → ndarray: Compute an all-vs-all similarity matrix for a list of Drug objects.

write_drug_markdown(*, headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms'], output_path: Path = PosixPath('drug_report.md'), include_mechanisms: bool = True, include_targets: bool = True) → Path

Fetch drug information and persist a Markdown report.

Parameters:

headings (Iterable[str], optional) – PUG-View headings to include in the text corpus.
output_path (pathlib.Path, default="drug_report.md") – Destination path for the report.
include_mechanisms (bool, default=True) – Include ChEMBL mechanism-of-action entries.
include_targets (bool, default=True) – Include UniProt accessions and gene symbols.

Returns:

Path to the written Markdown file.

Return type:

pathlib.Path

drugs.core.list_pubchem_text_headings(cid: int) → List[str]

List PUG-View headings available for a compound.

Parameters:: cid (int) – PubChem compound identifier.
Returns:: Unique heading labels in first-seen order.
Return type:: list[str]