This function interacts with the CompTox Chemistry Dashboard to download and extract a wide range of chemical data based on user-defined search criteria. It allows for flexible input types and supports downloading various chemical properties, identifiers, and predictive data.
It was inspired by the ECOTOXr::websearch_comptox
function.
Usage
extr_comptox(
ids,
download_items = c("DTXCID", "CASRN", "INCHIKEY", "IUPAC_NAME", "SMILES",
"INCHI_STRING", "MS_READY_SMILES", "QSAR_READY_SMILES", "MOLECULAR_FORMULA",
"AVERAGE_MASS", "MONOISOTOPIC_MASS", "QC_LEVEL", "SAFETY_DATA", "EXPOCAST",
"DATA_SOURCES", "TOXVAL_DATA", "NUMBER_OF_PUBMED_ARTICLES", "PUBCHEM_DATA_SOURCES",
"CPDAT_COUNT", "IRIS_LINK", "PPRTV_LINK", "WIKIPEDIA_ARTICLE", "QC_NOTES",
"ABSTRACT_SHIFTER", "TOXPRINT_FINGERPRINT", "ACTOR_REPORT", "SYNONYM_IDENTIFIER",
"RELATED_RELATIONSHIP", "ASSOCIATED_TOXCAST_ASSAYS",
"TOXVAL_DETAILS",
"CHEMICAL_PROPERTIES_DETAILS", "BIOCONCENTRATION_FACTOR_TEST_PRED",
"BOILING_POINT_DEGC_TEST_PRED", "48HR_DAPHNIA_LC50_MOL/L_TEST_PRED",
"DENSITY_G/CM^3_TEST_PRED", "DEVTOX_TEST_PRED",
"96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED", "FLASH_POINT_DEGC_TEST_PRED",
"MELTING_POINT_DEGC_TEST_PRED", "AMES_MUTAGENICITY_TEST_PRED",
"ORAL_RAT_LD50_MOL/KG_TEST_PRED", "SURFACE_TENSION_DYN/CM_TEST_PRED",
"THERMAL_CONDUCTIVITY_MW/(M*K)_TEST_PRED",
"TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED", "VISCOSITY_CP_CP_TEST_PRED",
"VAPOR_PRESSURE_MMHG_TEST_PRED", "WATER_SOLUBILITY_MOL/L_TEST_PRED",
"ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED",
"BIOCONCENTRATION_FACTOR_OPERA_PRED",
"BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED", "BOILING_POINT_DEGC_OPERA_PRED",
"HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED", "OPERA_KM_DAYS_OPERA_PRED",
"OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED",
"SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED",
"OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED", "MELTING_POINT_DEGC_OPERA_PRED",
"OPERA_PKAA_OPERA_PRED", "OPERA_PKAB_OPERA_PRED", "VAPOR_PRESSURE_MMHG_OPERA_PRED",
"WATER_SOLUBILITY_MOL/L_OPERA_PRED",
"EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY", "NHANES",
"TOXCAST_NUMBER_OF_ASSAYS/TOTAL", "TOXCAST_PERCENT_ACTIVE"),
mass_error = 0,
verify_ssl = FALSE,
...
)
Arguments
- ids
A character vector containing the items to be searched within the CompTox Chemistry Dashboard. These can be chemical names, CAS Registry Numbers (CASRN), InChIKeys, or DSSTox substance identifiers (DTXSID).
- download_items
A character vector of items to be downloaded. This includes a comprehensive set of chemical properties, identifiers, predictive data, and other relevant information. By Default, it download all the info
- DTXCID
The unique identifier for a chemical in the EPA's CompTox Chemicals Dashboard.
- CASRN
The Chemical Abstracts Service Registry Number, a unique numerical identifier for chemical substances.
- INCHIKEY
The hashed version of the full International Chemical Identifier (InChI) string.
- IUPAC_NAME
The International Union of Pure and Applied Chemistry (IUPAC) name of the chemical.
- SMILES
The Simplified Molecular Input Line Entry System (SMILES) representation of the chemical structure.
- INCHI_STRING
The full International Chemical Identifier (InChI) string.
- MS_READY_SMILES
The SMILES representation of the chemical structure, prepared for mass spectrometry analysis.
- QSAR_READY_SMILES
The SMILES representation of the chemical structure, prepared for quantitative structure-activity relationship (QSAR) modeling.
- MOLECULAR_FORMULA
The chemical formula representing the number and type of atoms in a molecule.
- AVERAGE_MASS
The average mass of the molecule, calculated based on the isotopic distribution of the elements.
- MONOISOTOPIC_MASS
The mass of the molecule calculated using the most abundant isotope of each element.
- QC_LEVEL
The quality control level of the data.
- SAFETY_DATA
Safety information related to the chemical.
- EXPOCAST
Exposure predictions from the EPA's ExpoCast program.
- DATA_SOURCES
Sources of the data provided.
- TOXVAL_DATA
Toxicological values related to the chemical.
- NUMBER_OF_PUBMED_ARTICLES
The number of articles related to the chemical in PubMed.
- PUBCHEM_DATA_SOURCES
Sources of data from PubChem.
- CPDAT_COUNT
The number of entries in the Chemical and Product Categories Database (CPDat).
- IRIS_LINK
Link to the EPA's Integrated Risk Information System (IRIS) entry for the chemical.
- PPRTV_LINK
Link to the EPA's Provisional Peer-Reviewed Toxicity Values (PPRTV) entry for the chemical.
- WIKIPEDIA_ARTICLE
Link to the Wikipedia article for the chemical.
- QC_NOTES
Notes related to the quality control of the data.
- ABSTRACT_SHIFTER
Information related to the abstract shifter.
- TOXPRINT_FINGERPRINT
The ToxPrint chemoinformatics fingerprint of the chemical.
- ACTOR_REPORT
The Aggregated Computational Toxicology Resource (ACTOR) report for the chemical.
- SYNONYM_IDENTIFIER
Identifiers for synonyms of the chemical.
- RELATED_RELATIONSHIP
Information on related chemicals.
- ASSOCIATED_TOXCAST_ASSAYS
Assays associated with the chemical in the ToxCast database.
- TOXVAL_DETAILS
Details of toxicological values.
- CHEMICAL_PROPERTIES_DETAILS
Details of the chemical properties.
- BIOCONCENTRATION_FACTOR_TEST_PRED
Predicted bioconcentration factor from tests.
- BOILING_POINT_DEGC_TEST_PRED
Predicted boiling point in degrees Celsius from tests.
- 48HR_DAPHNIA_LC50_MOL/L_TEST_PRED
Predicted 48-hour LC50 for Daphnia in mol/L from tests.
- DENSITY_G/CM^3_TEST_PRED
Predicted density in g/cm³ from tests.
- DEVTOX_TEST_PRED
Predicted developmental toxicity from tests.
- 96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED
Predicted 96-hour LC50 for fathead minnow in mol/L from tests.
- FLASH_POINT_DEGC_TEST_PRED
Predicted flash point in degrees Celsius from tests.
- MELTING_POINT_DEGC_TEST_PRED
Predicted melting point in degrees Celsius from tests.
- AMES_MUTAGENICITY_TEST_PRED
Predicted Ames mutagenicity from tests.
- ORAL_RAT_LD50_MOL/KG_TEST_PRED
Predicted oral LD50 for rats in mol/kg from tests.
- SURFACE_TENSION_DYN/CM_TEST_PRED
Predicted surface tension in dyn/cm from tests.
- THERMAL_CONDUCTIVITY_MW_M×K_TEST_PRED
Predicted thermal conductivity in mW/m×K from tests.
- TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED
Predicted IGC50 for Tetrahymena pyriformis in mol/L from tests.
- VISCOSITY_CP_CP_TEST_PRED
Predicted viscosity in cP from tests.
- VAPOR_PRESSURE_MMHG_TEST_PRED
Predicted vapor pressure in mmHg from tests.
- WATER_SOLUBILITY_MOL/L_TEST_PRED
Predicted water solubility in mol/L from tests.
- ATMOSPHERIC_HYDROXYLATION_RATE_\(AOH\)_CM3/MOLECULE\*SEC_OPERA_PRED
Predicted atmospheric hydroxylation rate in cm³/molecule\*sec from OPERA.
- BIOCONCENTRATION_FACTOR_OPERA_PRED
Predicted bioconcentration factor from OPERA.
- BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED
Predicted biodegradation half-life in days from OPERA.
- BOILING_POINT_DEGC_OPERA_PRED
Predicted boiling point in degrees Celsius from OPERA.
- HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED
Predicted Henry's law constant in atm-m³/mole from OPERA.
- OPERA_KM_DAYS_OPERA_PRED
Predicted Km in days from OPERA.
- OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED
Predicted octanol-air partition coefficient (log Koa) from OPERA.
- SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED
Predicted soil adsorption coefficient (Koc) in L/kg from OPERA.
- OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED
Predicted octanol-water partition coefficient (log P) from OPERA.
- MELTING_POINT_DEGC_OPERA_PRED
Predicted melting point in degrees Celsius from OPERA.
- OPERA_PKAA_OPERA_PRED
Predicted pKa (acidic) from OPERA.
- OPERA_PKAB_OPERA_PRED
Predicted pKa (basic) from OPERA.
- VAPOR_PRESSURE_MMHG_OPERA_PRED
Predicted vapor pressure in mmHg from OPERA.
- WATER_SOLUBILITY_MOL/L_OPERA_PRED
Predicted water solubility in mol/L from OPERA.
- EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY
Predicted median exposure from ExpoCast in mg/kg-bw/day.
- NHANES
National Health and Nutrition Examination Survey data.
- TOXCAST_NUMBER_OF_ASSAYS/TOTAL
Number of assays in ToxCast.
- TOXCAST_PERCENT_ACTIVE
Percentage of active assays in ToxCast.
- mass_error
Numeric value indicating the mass error tolerance for searches involving mass data. Default is
0
.- verify_ssl
Logical value indicating whether SSL certificates should be verified. Default is
FALSE
. Note that this argument is not used on linux OS.- ...
Additional arguments passed to
httr2::req_options()
. Note that this argument is not used on linux OS.
Details
Please note that this function, which pulls data from EPA servers, may encounter issues on some Linux systems.
This is because those servers do not accept secure legacy renegotiation. On Linux systems, the current function depends
on curl
and OpenSSL
, which have known problems with unsafe legacy renegotiation in newer versions.
One workaround is to downgrade to curl v7.78.0
and OpenSSL v1.1.1
.
However, please be aware that using these older versions might introduce potential security vulnerabilities.
Refer to this gist for instructions on how to downgrade curl
and OpenSSL
on Ubuntu.
Examples
# \donttest{
# Example usage of the function:
extr_comptox(ids = c("Aspirin", "50-00-0"))
#> ℹ Sending request to CompTox...
#> Request succeeded with status code: 202
#> ℹ Getting info from CompTox...
#> Request succeeded with status code: 200
#> $comptox_cover_sheet
#> # A tibble: 4 × 2
#> `Search datestamp` `2024-12-04 14:20:46`
#> <chr> <dbl>
#> 1 Search term count 2
#> 2 Found count 2
#> 3 Not found count 0
#> 4 Duplicate count 0
#>
#> $comptox_main_data
#> # A tibble: 2 × 64
#> INPUT FOUND_BY PREFERRED_NAME DTXCID CASRN INCHIKEY IUPAC_NAME SMILES
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Aspirin Approved Name Aspirin DTXCID5… 50-7… BSYNRYM… 2-(Acetyl… CC(=O…
#> 2 50-00-0 CASRN Formaldehyde DTXCID3… 50-0… WSFSSNU… Formaldeh… C=O
#> # ℹ 56 more variables: INCHI_STRING <chr>, MS_READY_SMILES <chr>,
#> # QSAR_READY_SMILES <chr>, MOLECULAR_FORMULA <chr>, AVERAGE_MASS <dbl>,
#> # MONOISOTOPIC_MASS <dbl>, QC_LEVEL <dbl>, SAFETY_DATA <chr>, EXPOCAST <chr>,
#> # DATA_SOURCES <dbl>, TOXVAL_DATA <chr>, NUMBER_OF_PUBMED_ARTICLES <dbl>,
#> # PUBCHEM_DATA_SOURCES <dbl>, CPDAT_COUNT <dbl>, IRIS_LINK <chr>,
#> # PPRTV_LINK <lgl>, WIKIPEDIA_ARTICLE <chr>, QC_NOTES <chr>,
#> # TOXPRINT_FINGERPRINT <chr>, ACTOR_REPORT <chr>, …
#>
#> $comptox_abstract_sifter
#> # A tibble: 2 × 3
#> DSSTOX_LINK_TO_DASHBOARD PREFERRED_NAME `CHEMICAL/ENTITY_QUERY`
#> <chr> <chr> <chr>
#> 1 DTXSID5020108 Aspirin 50-78-2 OR Aspirin
#> 2 DTXSID7020637 Formaldehyde 50-00-0 OR Formaldehyde
#>
#> $comptox_synonym_identifier
#> # A tibble: 2 × 3
#> SEARCHED_CHEMICAL IDENTIFIER `PC-CODES`
#> <chr> <chr> <chr>
#> 1 Aspirin Synonym data is too big for the Excel cell - Ref… PC-129061
#> 2 Formaldehyde NSC 298885|UN 2209|Formalin 40|Superlysoform|For… PC-043001
#>
#> $comptox_related_relationships
#> # A tibble: 60 × 7
#> INPUT DTXSID PREFERRED_NAME HAS_RELATIONSHIP_WITH RELATED_DTXSID
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Aspirin DTXSID5020108 Aspirin Searched Chemical DTXSID5020108
#> 2 Aspirin DTXSID5020108 Aspirin Predecessor: Component DTXSID0020109
#> 3 Aspirin DTXSID5020108 Aspirin Predecessor: Component DTXSID701336718
#> 4 Aspirin DTXSID5020108 Aspirin Transformation Product DTXSID5021708
#> 5 50-00-0 DTXSID7020637 Formaldehyde Searched Chemical DTXSID7020637
#> 6 50-00-0 DTXSID7020637 Formaldehyde Predecessor: Component DTXSID6029709
#> 7 50-00-0 DTXSID7020637 Formaldehyde Predecessor: Component DTXSID6029757
#> 8 50-00-0 DTXSID7020637 Formaldehyde Predecessor: Component DTXSID60873853
#> 9 50-00-0 DTXSID7020637 Formaldehyde Predecessor: Component DTXSID60905168
#> 10 50-00-0 DTXSID7020637 Formaldehyde Predecessor: Component DTXSID6094144
#> # ℹ 50 more rows
#> # ℹ 2 more variables: RELATED_PREFERRED_NAME <chr>, RELATED_CASRN <chr>
#>
#> $comptox_toxcast_assays_ac50
#> # A tibble: 1,485 × 3
#> INPUT 50-00-0_DTXSID702063…¹ ASPIRIN_DTXSID5020108
#> <chr> <chr> <chr>
#> 1 ACEA_AR_agonist_80hr - 1000000.0
#> 2 ACEA_AR_agonist_AUC_viability - 1000000.0
#> 3 ACEA_AR_antagonist_80hr - 1000000.0
#> 4 ACEA_AR_antagonist_AUC_viability - 1000000.0
#> 5 ACEA_ER_80hr - 1000000.0
#> 6 ACEA_ER_AUC_viability - 1000000.0
#> 7 APR_HepG2_CellCycleArrest_1hr - -
#> 8 APR_HepG2_CellCycleArrest_24hr - 1000000.0
#> 9 APR_HepG2_CellCycleArrest_72hr - 1000000.0
#> 10 APR_HepG2_CellLoss_1hr - -
#> # ℹ 1,475 more rows
#> # ℹ abbreviated name: ¹`50-00-0_DTXSID7020637`
#>
#> $comptox_toxval_details
#> # A tibble: 158 × 63
#> SEARCHED_CHEMICAL DTXSID CASRN NAME SOURCE SUB_SOURCE TOXVAL_TYPE
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Aspirin DTXSID5020108 50-78-2 Aspirin NLM C… - LD50
#> 2 Aspirin DTXSID5020108 50-78-2 Aspirin NLM C… - LD50
#> 3 Aspirin DTXSID5020108 50-78-2 Aspirin NLM C… - LD50
#> 4 Aspirin DTXSID5020108 50-78-2 Aspirin GESTI… - DNEL syste…
#> 5 Aspirin DTXSID5020108 50-78-2 Aspirin DOD M… TLVadj MEG
#> 6 Aspirin DTXSID5020108 50-78-2 Aspirin DOD M… TLV_TWA MEG
#> 7 Aspirin DTXSID5020108 50-78-2 Aspirin DOD M… TLV_TWA MEG
#> 8 Aspirin DTXSID5020108 50-78-2 Aspirin ECHA … Developme… NOAEL
#> 9 Aspirin DTXSID5020108 50-78-2 Aspirin ECHA … Developme… NOAEL
#> 10 Aspirin DTXSID5020108 50-78-2 Aspirin EPA E… EPA ORD LOEL
#> # ℹ 148 more rows
#> # ℹ 56 more variables: TOXVAL_SUBTYPE <chr>, TOXVAL_TYPE_SUPERCATEGORY <chr>,
#> # QUALIFIER <chr>, TOXVAL_NUMERIC <dbl>, TOXVAL_UNITS <chr>,
#> # RISK_ASSESSMENT_CLASS <chr>, STUDY_TYPE <chr>, STUDY_DURATION_CLASS <chr>,
#> # STUDY_DURATION_VALUE <dbl>, STUDY_DURATION_UNITS <chr>,
#> # SPECIES_COMMON <chr>, STRAIN <chr>, LATIN_NAME <chr>,
#> # SPECIES_SUPERCATEGORY <chr>, SEX <chr>, GENERATION <chr>, …
#>
#> $comptox_chemical_properties
#> # A tibble: 101 × 8
#> DTXSID DTXCID TYPE NAME VALUE UNITS SOURCE DESCRIPTION
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 DTXSID5020108 DTXCID50108 predicted Flash … 131.… °C ACD/L… "ACD/Labs …
#> 2 DTXSID5020108 DTXCID50108 experimental Meltin… 138.0 °C Alfa … "Alfa Aesa…
#> 3 DTXSID5020108 DTXCID50108 experimental Meltin… 136.5 °C Alfa … "Alfa Aesa…
#> 4 DTXSID5020108 DTXCID50108 experimental Boilin… 140.0 °C NIOSH "The NIOSH…
#> 5 DTXSID5020108 DTXCID50108 experimental Boilin… 140.0 °C Oxfor… "Until 201…
#> 6 DTXSID5020108 DTXCID50108 experimental Meltin… 122.0 °C MolMa… "MolMall p…
#> 7 DTXSID5020108 DTXCID50108 experimental Meltin… 134.0 °C Tokyo… "Tokyo Che…
#> 8 DTXSID5020108 DTXCID50108 experimental Meltin… 135.0 °C Jean-… "Jean-Clau…
#> 9 DTXSID5020108 DTXCID50108 experimental Meltin… 135.0 °C PhysP… "The PHYSP…
#> 10 DTXSID5020108 DTXCID50108 experimental Meltin… 136.0 °C LKT L… "LKT Labor…
#> # ℹ 91 more rows
#>
# }