Tutorial: from CID to embeddings
This short walkthrough demonstrates how to start from a PubChem CID, fetch core metadata, and compute embeddings.
Prerequisites
pip install -e .
If you want to run the optional embedding helpers, install extras as needed:
# voyage example
pip install langchain-voyageai
# or OpenAI
pip install openai
# or sentence-transformers
pip install sentence-transformers
Step 1: create a drug object
from drugs import Drug
aspirin = Drug.from_pubchem_cid(2244)
print(aspirin.map_ids())
Step 2: inspect properties and text
props = aspirin.fetch_pubchem_properties()
text = aspirin.fetch_pubchem_text()
print(props.get("IUPACName"))
print(list(text)) # headings fetched
Step 3: mechanisms and targets
mechs = aspirin.fetch_chembl_mechanisms()
print(mechs[:1])
print(aspirin.target_accessions())
print(aspirin.target_gene_symbols())
Step 4: generate embeddings (optional)
# Dummy embedding function; replace with your model
vec = aspirin.text_embedding(lambda text: text[:128])
print(vec)
Step 5: write a markdown report
path = aspirin.write_drug_markdown()
print(f"Report written to {path}")
Tips
Use
drugs.core.list_pubchem_text_headings(cid)to see available headings.The caching helpers
protein_embedding_cachedandtext_embedding_cachedstore artifacts underartifacts/embeddingsby default.