Tutorial: from CID to embeddings

This short walkthrough demonstrates how to start from a PubChem CID, fetch core metadata, and compute embeddings.

Prerequisites

pip install -e .

If you want to run the optional embedding helpers, install extras as needed:

# voyage example
pip install langchain-voyageai
# or OpenAI
pip install openai
# or sentence-transformers
pip install sentence-transformers

from drugs import Drug

aspirin = Drug.from_pubchem_cid(2244)
print(aspirin.map_ids())

props = aspirin.fetch_pubchem_properties()
text = aspirin.fetch_pubchem_text()

print(props.get("IUPACName"))
print(list(text))  # headings fetched

mechs = aspirin.fetch_chembl_mechanisms()
print(mechs[:1])

print(aspirin.target_accessions())
print(aspirin.target_gene_symbols())

# Dummy embedding function; replace with your model
vec = aspirin.text_embedding(lambda text: text[:128])
print(vec)

path = aspirin.write_drug_markdown()
print(f"Report written to {path}")

Use drugs.core.list_pubchem_text_headings(cid) to see available headings.
The caching helpers protein_embedding_cached and text_embedding_cached store artifacts under artifacts/embeddings by default.