Skip to contents

A convenience wrapper that resolves one or more substance identifiers to GSRS UNIIs and then fetches the embedded chemical structure data for each substance. The result is one wide row per input identifier containing both the resolved metadata and the full structure record.

Usage

gsrs_chem_info(
  identifiers,
  type = c("name", "cas", "unii", "inchikey", "smiles"),
  verbose = TRUE,
  delay = 0.5
)

Arguments

identifiers

Character vector of substance identifiers.

type

Character scalar. The identifier type. One of:

"name"

Common or systematic substance name (default).

"cas"

CAS Registry Number (e.g., "50-78-2").

"unii"

FDA UNII / approval ID (e.g., "R16CO5Y76E"). Skips the search step and fetches the structure directly.

"inchikey"

Standard InChIKey (e.g., "BSYNRYMUTXBXSQ-UHFFFAOYSA-N").

"smiles"

SMILES string. Uses an exact structure search to resolve to a UNII before fetching the structure record.

verbose

Logical. If TRUE, emit progress messages. Default TRUE.

delay

Numeric. Seconds to wait between individual API calls. Default 0.5.

Value

A data frame with one row per input identifier and columns:

query

The identifier supplied by the caller.

type

The identifier type ("name" or "cas").

unii

Resolved UNII / approval ID.

preferred_name

Preferred display name in GSRS.

substance_class

Substance class (e.g., "chemical").

smiles

Canonical SMILES string.

formula

Molecular formula (e.g., "C9H8O4").

mwt

Molecular weight (numeric).

inchi_key

Standard InChIKey.

inchi

Full InChI string.

stereochemistry

Stereochemistry descriptor.

optical_activity

Optical activity descriptor.

charge

Formal charge (integer).

stereo_centers

Number of stereocenters.

defined_stereo

Number of defined stereocenters.

ez_centers

Number of E/Z double-bond stereocenters.

molfile

MDL molfile as a string.

date_retrieved

Date the structure response was received.

Unresolved identifiers or non-chemical substances produce a row of NAs with query and type set. Returns NULL on error (with a warning).

Examples

# \donttest{
  Sys.sleep(2)
  out <- gsrs_chem_info(c("aspirin", "ibuprofen"), type = "name")
#>  Resolving name: aspirin ...
#>  Resolving name: ibuprofen ...
#>  Fetching structure for UNII: R16CO5Y76E ...
#>  Fetching structure for UNII: RM1CE97Z4N ...
#>  Retrieved chemical info for 2 substance(s).
  if (!is.null(out)) print(out[, c("query", "unii", "formula", "mwt")])
#>       query       unii          formula      mwt
#> 1   aspirin R16CO5Y76E           C9H8O4 180.1578
#> 2 ibuprofen RM1CE97Z4N C13H17O2.Na.2H2O 264.2937

  Sys.sleep(2)
  out_cas <- gsrs_chem_info(c("50-78-2", "15687-27-1"), type = "cas")
#>  Resolving CAS: 50-78-2 ...
#>  Resolving CAS: 15687-27-1 ...
#>  Fetching structure for UNII: R16CO5Y76E ...
#>  Fetching structure for UNII: WK2XYI10QM ...
#>  Retrieved chemical info for 2 substance(s).
  if (!is.null(out_cas)) print(out_cas[, c("query", "unii", "formula", "mwt")])
#>        query       unii  formula      mwt
#> 1    50-78-2 R16CO5Y76E   C9H8O4 180.1578
#> 2 15687-27-1 WK2XYI10QM C13H18O2 206.2813

  Sys.sleep(2)
  out_unii <- gsrs_chem_info("R16CO5Y76E", type = "unii")
#>  Using UNII directly: R16CO5Y76E ...
#>  Fetching structure for UNII: R16CO5Y76E ...
#>  Retrieved chemical info for 1 substance(s).
  if (!is.null(out_unii)) print(out_unii[, c("query", "formula", "mwt")])
#>        query formula      mwt
#> 1 R16CO5Y76E  C9H8O4 180.1578

  Sys.sleep(2)
  out_ik <- gsrs_chem_info("BSYNRYMUTXBXSQ-UHFFFAOYSA-N", type = "inchikey")
#>  Resolving InChIKey: BSYNRYMUTXBXSQ-UHFFFAOYSA-N ...
#>  Fetching structure for UNII: R16CO5Y76E ...
#>  Retrieved chemical info for 1 substance(s).
  if (!is.null(out_ik)) print(out_ik[, c("query", "unii", "formula")])
#>                         query       unii formula
#> 1 BSYNRYMUTXBXSQ-UHFFFAOYSA-N R16CO5Y76E  C9H8O4

  Sys.sleep(2)
  out_smi <- gsrs_chem_info("CC(=O)Oc1ccccc1C(=O)O", type = "smiles")
#>  Resolving SMILES (exact): CC(=O)Oc1ccccc1C(=O)O ...
#>  Fetching structure for UNII: R16CO5Y76E ...
#>  Retrieved chemical info for 1 substance(s).
  if (!is.null(out_smi)) print(out_smi[, c("query", "unii", "formula")])
#>                   query       unii formula
#> 1 CC(=O)Oc1ccccc1C(=O)O R16CO5Y76E  C9H8O4
# }