core
High-level Drug object that lazily resolves identifiers and fetches data.
- class drugs.core.Drug(_pubchem_cid: int | None = None, _chembl_id: str | None = None, _inchikey: str | None = None, synonyms: ~typing.List[str] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)
Bases:
objectRepresent a drug and lazily translate between PubChem CID, ChEMBL ID, and InChIKey.
Network calls are performed on demand and results are cached on the instance to avoid repeated HTTP calls. The class is intentionally model-agnostic: callers can plug in any embedding function via
text_embedding/protein_embedding.- synonyms: List[str]
- metadata: Dict[str, Any]
- property pubchem_cid: int | None
- property chembl_id: str | None
- property inchikey: str | None
- fetch_pubchem_properties() Dict[str, Any]
Retrieve and cache core PubChem properties.
- Returns:
Dictionary of properties for the resolved PubChem CID.
- Return type:
dict
- Raises:
ValueError – If no resolvable PubChem CID is available.
- fetch_pubchem_text(headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) Dict[str, Any]
Fetch and cache selected PubChem PUG-View text sections.
- Parameters:
headings (Iterable[str], optional) – Headings to request. Defaults to
PUBCHEM_MINIMAL_STABLE.- Returns:
Mapping of heading -> section metadata and extracted strings.
- Return type:
dict
- Raises:
ValueError – If no resolvable PubChem CID is available.
- fetch_chembl_mechanisms(limit: int = 50) List[Dict[str, Any]]
Fetch mechanisms of action for the drug’s ChEMBL ID.
- Parameters:
limit (int, default=50) – Maximum number of mechanism records to fetch.
- Returns:
Mechanism entries from the ChEMBL API.
- Return type:
list[dict]
- Raises:
ValueError – If no resolvable ChEMBL ID is available.
- fetch_chembl_bioactivities(*, min_pchembl: float = 5.0, assay_types: Iterable[str] = ('B', 'F'), limit: int = 1000) List[Dict[str, Any]]
Fetch ChEMBL bioactivity rows filtered by potency and assay type.
- fetch_target_details(target_chembl_id: str) Dict[str, Any]
Fetch and cache target details for a ChEMBL target ID.
- Parameters:
target_chembl_id (str) – ChEMBL target identifier.
- Returns:
Target detail payload including components and synonyms.
- Return type:
dict
- fetch_drug_interactions(drug_name: str | None = None) List[Dict[str, Any]]
Fetch drug-drug interactions from RxNav using a best-effort drug name.
- Parameters:
drug_name (str, optional) – Override the drug name to query. When omitted, the function attempts to use IUPAC or synonym information from PubChem properties.
- Returns:
Normalized interaction entries containing
source,description, andinteractants(list of partner drug names).- Return type:
list[dict]
- target_accessions() List[str]
Return UniProt accessions for all targets linked to the drug.
- target_gene_symbols() List[str]
Return gene symbols for all targets linked to the drug.
- smiles() str | None
Return canonical SMILES string resolved from PubChem properties.
- selfies() str | None
Convert the molecule to SELFIES representation if possible.
- rdkit_mol() Any
Return an RDKit molecule for the drug’s SMILES, caching the result.
- molecular_fingerprint(*, method: str = 'morgan', n_bits: int = 2048, radius: int = 2, use_features: bool = False) ndarray
Generate a molecular fingerprint for similarity calculations.
- similarity_to(other: Drug, *, fingerprint_method: str = 'morgan', similarity_metric: str = 'tanimoto', n_bits: int = 2048, radius: int = 2, use_features: bool = False) float
Compute structural similarity to another drug using fingerprints.
- molecular_properties() Dict[str, Any]
Compute RDKit-derived molecular property panel (QED, TPSA, Lipinski, SA).
- text_corpus(headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) str
Concatenate PubChem text snippets into a markdown-ish corpus.
- Parameters:
headings (Iterable[str], optional) – Headings to include. Defaults to
PUBCHEM_MINIMAL_STABLE.- Returns:
Formatted corpus with heading markers and snippets.
- Return type:
str
- text_embedding(embed_fn: Callable[[str], Any], headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms']) Any
Compute a text embedding over the PubChem corpus.
- Parameters:
embed_fn (Callable[[str], Any]) – User-provided embedding function accepting a text corpus.
headings (Iterable[str], optional) – Headings to include when building the corpus.
- Returns:
Embedding output as returned by
embed_fn.- Return type:
Any
- esm_inputs() List[str]
Return UniProt accessions to feed into a protein embedding model.
- protein_embedding(embed_fn: Callable[[List[str]], Any]) Any
Compute a protein embedding over target accessions.
- Parameters:
embed_fn (Callable[[List[str]], Any]) – Embedding function that consumes a list of UniProt accessions.
- Returns:
Embedding output as returned by
embed_fn.- Return type:
Any
- protein_embedding_cached(embed_fn: Callable[[List[str]], Any], *, path: Path | None = None, load_if_exists: bool = True, force: bool = False) Any
Compute or load a cached protein embedding.
- Parameters:
embed_fn (Callable[[List[str]], Any]) – Embedding function consuming accessions.
path (pathlib.Path, optional) – Custom output path. Defaults to an auto-generated path.
load_if_exists (bool, default=True) – Return the cached embedding if the file already exists.
force (bool, default=False) – Recompute even if the cache exists.
- Returns:
Embedding output returned by
embed_fnor loaded from disk.- Return type:
Any
- text_embedding_cached(embed_fn: Callable[[str], Any], *, headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms'], path: Path | None = None, load_if_exists: bool = True, force: bool = False) Any
Compute or load a cached text embedding.
- Parameters:
embed_fn (Callable[[str], Any]) – Embedding function consuming a corpus string.
headings (Iterable[str], optional) – Headings to include when building the corpus.
path (pathlib.Path, optional) – Custom output path. Defaults to an auto-generated path.
load_if_exists (bool, default=True) – Return the cached embedding if the file already exists.
force (bool, default=False) – Recompute even if the cache exists.
- Returns:
Embedding output returned by
embed_fnor loaded from disk.- Return type:
Any
- map_ids() Dict[str, str | None]
Return a dictionary with the best-effort resolved identifiers.
- classmethod from_batch(identifiers: List[str | int], *, prefetch_properties: bool = False, max_workers: int = 8) List[Drug]
Create Drug instances from a batch of identifiers using parallel calls.
- static batch_similarity_matrix(drugs: List[Drug], *, fingerprint_method: str = 'morgan', similarity_metric: str = 'tanimoto', n_bits: int = 2048, radius: int = 2, use_features: bool = False) ndarray
Compute an all-vs-all similarity matrix for a list of Drug objects.
- write_drug_markdown(*, headings: Iterable[str] = ['Record Description', 'Drug and Medication Information', 'Names and Identifiers', 'Synonyms'], output_path: Path = PosixPath('drug_report.md'), include_mechanisms: bool = True, include_targets: bool = True) Path
Fetch drug information and persist a Markdown report.
- Parameters:
headings (Iterable[str], optional) – PUG-View headings to include in the text corpus.
output_path (pathlib.Path, default="drug_report.md") – Destination path for the report.
include_mechanisms (bool, default=True) – Include ChEMBL mechanism-of-action entries.
include_targets (bool, default=True) – Include UniProt accessions and gene symbols.
- Returns:
Path to the written Markdown file.
- Return type:
pathlib.Path
- drugs.core.list_pubchem_text_headings(cid: int) List[str]
List PUG-View headings available for a compound.
- Parameters:
cid (int) – PubChem compound identifier.
- Returns:
Unique heading labels in first-seen order.
- Return type:
list[str]