Tutorial: from CID to embeddings
================================

This short walkthrough demonstrates how to start from a PubChem CID, fetch core
metadata, and compute embeddings.

Prerequisites
-------------

.. code-block:: powershell

	pip install -e .

If you want to run the optional embedding helpers, install extras as needed:

.. code-block:: powershell

	# voyage example
	pip install langchain-voyageai
	# or OpenAI
	pip install openai
	# or sentence-transformers
	pip install sentence-transformers

Step 1: create a drug object
----------------------------

.. code-block:: python

	from drugs import Drug

	aspirin = Drug.from_pubchem_cid(2244)
	print(aspirin.map_ids())

Step 2: inspect properties and text
-----------------------------------

.. code-block:: python

	props = aspirin.fetch_pubchem_properties()
	text = aspirin.fetch_pubchem_text()

	print(props.get("IUPACName"))
	print(list(text))  # headings fetched

Step 3: mechanisms and targets
------------------------------

.. code-block:: python

	mechs = aspirin.fetch_chembl_mechanisms()
	print(mechs[:1])

	print(aspirin.target_accessions())
	print(aspirin.target_gene_symbols())

Step 4: generate embeddings (optional)
--------------------------------------

.. code-block:: python

	# Dummy embedding function; replace with your model
	vec = aspirin.text_embedding(lambda text: text[:128])
	print(vec)

Step 5: write a markdown report
--------------------------------

.. code-block:: python

	path = aspirin.write_drug_markdown()
	print(f"Report written to {path}")

Tips
----

- Use ``drugs.core.list_pubchem_text_headings(cid)`` to see available headings.
- The caching helpers ``protein_embedding_cached`` and ``text_embedding_cached``
  store artifacts under ``artifacts/embeddings`` by default.