Prediction of metabolites by MicrobeRX

Loading the modules and data files of MicrobeRX

MicrobeRX requires a database of reaction rules in order to predict. The program can only make predictions based on reaction rules; however, evidences of human and gut microbial biotransoformations have been included to help identify the origin of the metabolites.

Load the DataFiles module to load the evidences and reaction rules.

[1]:

from microberx.DataFiles import load_evidences, load_reaction_rules

import pandas as pd

The reaction rules and evidences are easily accessible as dataframes. Use these files as templates if you want to make predictions based on your own rules and evidences. MicrobeRX’s functionality is dependent on these files.

[2]:

EVIDENCES= load_evidences()
EVIDENCES.head(5)

INFO: Loading evidences...

[2]:

	reaction_id	compartment	origin	name	subsystem	scheme_ids	scheme_names	reversibility	ec	metanetx_reaction	seed_reaction	sbo	rhea	kegg_reaction	pubmed
0	23DHMPO	Cytosol	GutMicrobes	(R)-2,3-Dihydroxy-3-methylpentanoate:NADP+ oxi...	Valine, leucine, and isoleucine metabolism	23dhmp[c] + nadp[c] <=> 3h3mop[c] + h[c] + nad...	(R)-2,3-Dihydroxy-3-methylpentanoate + Nicotin...	True	1.1.1.86	MNXR188296	rxn03435	SBO:0000176	NaN	NaN	NaN
1	26DAPLLAT	Cytosol	GutMicrobes	L,L-diaminopimelate aminotransferase	Lysine metabolism	26dap_LL[c] + akg[c] <=> glu_L[c] + h2o[c] + h...	LL-2,6-Diaminoheptanedioate + 2-Oxoglutarate <...	True	2.6.1.83	MNXR97144	rxn07441	SBO:0000176	NaN	NaN	NaN
2	2AHBUTI	Cytosol	GutMicrobes	(S)-2-Aceto-2-hydroxybutanoate isomerase	Valine, leucine, and isoleucine metabolism	2ahbut[c] <=> 3h3mop[c]	(S)-2-Aceto-2-hydroxybutanoate <=> (R)-3-Hydro...	True	1.1.1.86;5.4.99.3	MNXR191966	rxn03436	SBO:0000176	NaN	NaN	NaN
3	3MOBS	Cytosol	GutMicrobes	3-methyl-2-oxobutanoate synthase	Valine, leucine, and isoleucine metabolism	3c3hmp[c] + coa[c] + h[c] <=> 3mob[c] + accoa[...	3-Carboxy-3-hydroxy-4-methylpentanoate + Coenz...	True	2.3.3.13;4.1.3.12	MNXR188277	rxn00902	SBO:0000176	NaN	NaN	NaN
4	3OAR120	Cytosol	GutMicrobes	3-oxoacyl-[acyl-carrier-protein] reductase (n-...	Fatty acid synthesis	3oddecACP[c] + h[c] + nadph[c] <=> 3hddecACP[c...	3-Oxododecanoyl-[acyl-carrier protein] + proto...	True	1.1.1.100	MNXR152659	rxn05340	SBO:0000176	NaN	NaN	NaN

[3]:

RULES_DATABASE = load_reaction_rules()
RULES_DATABASE.head(5)

INFO: Loading reaction rules...

[3]:

	num_atoms	rule	substrate	substrate_map	product	product_map	reaction_id
0	4	[#6&!R:3]-[#6&!R:6](-[#8&!R:7])-[#6&!R:8]>>[#6...	23dhmp	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[CH:6]([OH...	3h3mop	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[C:6](=[O:...	23DHMPO_LR
1	9	[#6&!R:2]-[#6&!R:3](-[#6&!R:4])(-[#8&!R:5])-[#...	23dhmp	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[CH:6]([OH...	3h3mop	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[C:6](=[O:...	23DHMPO_LR
2	10	[#6&!R:1]-[#6&!R:2]-[#6&!R:3](-[#6&!R:4])(-[#8...	23dhmp	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[CH:6]([OH...	3h3mop	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[C:6](=[O:...	23DHMPO_LR
3	4	[#6&!R:3]-[#6&!R:6](=[#8&!R:7])-[#6&!R:8]>>[#6...	3h3mop	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[C:6](=[O:...	23dhmp	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[CH:6]([OH...	23DHMPO_RL
4	9	[#6&!R:2]-[#6&!R:3](-[#6&!R:4])(-[#8&!R:5])-[#...	3h3mop	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[C:6](=[O:...	23dhmp	[CH3:1][CH2:2][C:3]([CH3:4])([OH:5])[CH:6]([OH...	23DHMPO_RL

Prediction of metabolites

Aside from rules, the query molecule is an important component for prediction. Because the program uses rdkit to handle molecules, the queries can be represented in a variety of formats. For instance, SMILE, SMARTS, InChi, and so on.

Importing the prediction module of MicrobeRX

[4]:

from microberx import MetabolitePredictor

[5]:

from rdkit import Chem

smi="[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)CC(=O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C"
query_name='Prednisone'
query=Chem.MolFromSmiles(smi)
query

[5]:

../_images/tutorials_PredictionMetabolites_11_0.png

Predictions can take from a few seconds to a few minutes, depending on the number of reaction rules used. For clarity, the program will display a progress bar.

[6]:

Predictor=MetabolitePredictor(query,query_name=query_name,biosystem="all",cut_off=0.6)
Predictor.run_prediction()

INFO: Loading evidences...
INFO: Loading reaction rules...

The prediction module’s main output is a dataframe containing detailed information about the rules used and the prediction performed. The main data is a column containing the predicted metabolite structures as SMILES (main product smiles) along with other relevant information. It is recommended that you explore the dataframe to become acquainted with this output.

[7]:

metabolites=Predictor.predicted_metabolites
metabolites.head(5)

[7]:

	main_product_smiles	secondary_products_smiles	similarity_substrates	similarity_products	reacting_atoms_in_query	reaction_id	substrate	product	num_atoms	bigg_reaction	...	compartment	name	subsystem	ec	metanetx_reaction	pubmed
0	CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.867	0.868	[24, 18, 17, 16, 15, 14, 12, 13, 11, 9, 10, 0,...	HSD11B1r_LR	cortsn	crtsl	15	HSD11B1r	...	Endoplasmic_reticulum	11-Beta-Hydroxysteroid Dehydrogenase Type 1	Steroid metabolism	1.1.1.146	MNXR145242	11673786;15466942;7859916
1	CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.867	0.868	[24, 18, 17, 16, 15, 14, 12, 13, 11, 9, 10, 0,...	HSD11B2r_LR	cortsn	crtsl	15	HSD11B2r	...	Endoplasmic_reticulum	11-Beta-Hydroxysteroid Dehydrogenase Type 2	Steroid metabolism	1.1.1.27	MNXR145244	11673786;15466942;7859916
2	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(=O...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.741	0.752	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 0, 1, ...	HMR_1988_LR	17ahprgstrn	11docrtsl	18	HMR_1988	...	Cytosol	Steroid 21-Monooxygenase	Steroid metabolism	1.14.99.10	MNXR102263	3487786;3038528
3	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(=O...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.741	0.752	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 0, 1, ...	P45021A2r_LR	17ahprgstrn	11docrtsl	18	P45021A2r	...	Endoplasmic_reticulum	Steroid 21-Hydroxylase	Steroid metabolism	1.14.99.10	MNXR102263	16541276;18381579;18381580
4	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C(=O)CCC12	O=C([O-])CO.NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])...	0.741	0.705	[10, 9, 11, 12, 14, 15, 16, 17, 18, 24, 0, 1, ...	RE1096M_LR	17ahprgstrn	andrstndn	18	RE1096M	...	Mitochondria	RE1096M	Steroid metabolism	2.4.1.17	MNXR102258	10049998;7578007

5 rows × 24 columns

[ ]:

#metabolites.to_csv('test/prednisone_metabolites.tsv',sep='\t',index=False)

Analysis and Visualization

The ability to predict chemical structures is at the heart of MicrobeRX. As a result, the tool includes a number of fuctions for analyzing and visualizing the predicted metabolites.

The following sequence presents a possible analysis idea. However, the user can use the functions in any order that best suits their needs.

Along with the DataFiles, MicrobeRX includes a number of tools for analyzing and visualizing metabolites and evidences.

Analyzer:
- compute_molecular_descriptors
- compute_isotopic_mass
- search_pubchem
- classify_molecules
Visualizer:
- plot_confidence_scores
- plot_molecular_descriptors
- plot_isotopic_masses
- plot_metabolic_accesibility
- display_molecules
- plot_evidences

[8]:

from microberx.MetaboliteAnalyzer import compute_molecular_descriptors, compute_isotopic_mass, search_pubchem, classify_molecules

from microberx.MetaboliteVisualizer import plot_confidence_scores, plot_molecular_descriptors, plot_boiled_egg, plot_isotopic_masses, plot_metabolic_accesibility, display_molecules, plot_relationships

Metabolic accesibility

This function creates a 2D image of a molecule with the atoms colored according to their metabolic accessibility, which is calculated as the frequency of the atom in the reacting_atoms_in_query column of the data frame. The function returns a matplotlib Figure object that can be displayed or modified.

[9]:

accesibility=plot_metabolic_accesibility(metabolites,molecule=query,atom_map_col='reacting_atoms_in_query',mol_name=query_name,alpha=0.8)

../_images/tutorials_PredictionMetabolites_22_0.png

Confidence score

This function creates a 3D scatter plot of the data frame with the x, y, and z axes representing the similarity of substrates, products, and reacting atoms efficiency respectively. The color of each point indicates the confidence score of the corresponding metabolite id. The function returns an interactive plotly Figure object that can be displayed or modified.

[10]:

plot_confidence_scores(metabolites)

Results manipulation

Because of the number of atoms used in reaction rules during the predictions. Predictions with varying levels of structural confidence can be obtained. The benefit of using data frames is that filtering and selecting results is very simple.

[11]:

best_metabolites=metabolites[metabolites.confidence_score>=1.5]

[12]:

unique_metabolites=best_metabolites.drop_duplicates(subset=['metabolite_id'],ignore_index=True)

Plot relationships

This function generates a Sankey diagram to display the relationships between metabolite annotations in a data frame. This plot is especially interesting for analyzing the relationships between metabolites and evidences.

The use of standardized Recon3D and AGORA2 reactions enables the use of high-quality annotations of the biotransformations included in MicrobeRX. The following example provides an interesting and straightforward analysis of the relationships between predicted metabolites and the enzymes and organisms that produce them.

[15]:

plot_relationships(unique_metabolites,nodes=["reaction_id","origin","compartment","subsystem","name",'metabolite_id'])

Molecular descriptors

This function computes and plots common molecular descriptors for a given data frame using SMILES strings. The descriptors are added as new columns at the end of the dataframe.

Lipinski and Veber columns indicates whether the molecule satisfies the Lipinski’s or Veber’s rules or not for easy identification of orally active drugs.

Moreover, Brenk and PAINS matches are listed for the identification of unwanted reactive substructures or pan-assay interference compounds.

[16]:

descriptors=compute_molecular_descriptors(unique_metabolites,smiles_col='main_product_smiles')
descriptors.head()

[16]:

	main_product_smiles	secondary_products_smiles	similarity_substrates	similarity_products	reacting_atoms_in_query	reaction_id	substrate	product	num_atoms	bigg_reaction	...	MolWt	LogP	NumHAcceptors	NumHDonors	NumRotatableBonds	TPSA	MolFormula	Lipinski	Veber	Brenk
0	CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.867	0.868	[24, 18, 17, 16, 15, 14, 12, 13, 11, 9, 10, 0,...	HSD11B1r_LR	cortsn	crtsl	15	HSD11B1r	...	360.450	1.558	5.0	3.0	2.0	94.83	C21H28O5	True	True	nan
1	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(=O...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.741	0.752	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 0, 1, ...	HMR_1988_LR	17ahprgstrn	11docrtsl	18	HMR_1988	...	374.433	1.084	6.0	3.0	2.0	111.90	C21H26O6	True	True	het-C-het_not_in_ring
2	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C(=O)CCC12	O=C([O-])CO.NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])...	0.741	0.705	[10, 9, 11, 12, 14, 15, 16, 17, 18, 24, 0, 1, ...	RE1096M_LR	17ahprgstrn	andrstndn	18	RE1096M	...	298.382	3.042	3.0	0.0	0.0	51.21	C19H22O3	True	True	nan
3	CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C...	CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1OC(N2CNc3c(...	0.729	0.761	[8, 7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9,...	ACCOACORAT_LR	crtsl	hcsnact	15	ACCOACORAT	...	400.471	2.337	6.0	1.0	3.0	97.74	C23H28O6	True	True	nan
4	CC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C(=O)CC21C	O=O.NC(=O)C1=CN(C2OC(COP(=O)([O-])OP(=O)([O-])...	0.729	0.707	[7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9, 10]	HMR_1990_LR	crtsl	M00603	15	HMR_1990	...	342.435	2.793	4.0	1.0	1.0	71.44	C21H26O4	True	True	nan

5 rows × 34 columns

The function normalizes the molecular descriptors to fit in the range [0, 1] and then plots them as radial lines for each compound. The function also plots the upper and lower limits of the Lipinski’s rule of five as shaded regions in orange and yellow, respectively. The function uses distinct colors for each compound and displays a legend on the right side of the plot.

[17]:

plot_molecular_descriptors(descriptors,names_col='metabolite_id')

Apart from efficacy and toxicity, many compounds have poor pharmacokinetics and bioavailability. Gastrointestinal absorption and brain access are two pharmacokinetic behaviors crucial to estimate at various stages of the drug discovery processes. To this end, the Brain Or IntestinaL EstimateD permeation method (BOILED-Egg) is proposed as an accurate predictive model that works by computing the lipophilicity and polarity of small molecules.

[18]:

plot_boiled_egg(descriptors,names_col='metabolite_id')

Isotopic masses

The function iterates over the rows of the data frame and uses the EmpiricalFormula class from pyOpenMS to create an object for each molecular formula. Then, it generates the isotopic mass distribution. It calculates the sum of the probabilities of all isotopes and stores it in the ‘probability_sum’ column. It also formats the mass and probability of each isotope as a string and stores it in the ‘mass_distribution’ column, separated by semicolons.

[19]:

masses=compute_isotopic_mass(descriptors,molformula_col='MolFormula')
masses.head()

[19]:

	main_product_smiles	secondary_products_smiles	similarity_substrates	similarity_products	reacting_atoms_in_query	reaction_id	substrate	product	num_atoms	bigg_reaction	...	NumHAcceptors	NumHDonors	NumRotatableBonds	TPSA	MolFormula	Lipinski	Veber	Brenk	probability_sum	mass_distribution
0	CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.867	0.868	[24, 18, 17, 16, 15, 14, 12, 13, 11, 9, 10, 0,...	HSD11B1r_LR	cortsn	crtsl	15	HSD11B1r	...	5.0	3.0	2.0	94.83	C21H28O5	True	True	nan	1.0	360.1937:78.5875;361.197:18.2524;362.2004:2.83...
1	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(=O...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.741	0.752	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 0, 1, ...	HMR_1988_LR	17ahprgstrn	11docrtsl	18	HMR_1988	...	6.0	3.0	2.0	111.90	C21H26O6	True	True	het-C-het_not_in_ring	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...
2	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C(=O)CCC12	O=C([O-])CO.NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])...	0.741	0.705	[10, 9, 11, 12, 14, 15, 16, 17, 18, 24, 0, 1, ...	RE1096M_LR	17ahprgstrn	andrstndn	18	RE1096M	...	3.0	0.0	0.0	51.21	C19H22O3	True	True	nan	1.0	298.1569:80.7305;299.1603:16.8865;300.1636:2.1...
3	CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C...	CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1OC(N2CNc3c(...	0.729	0.761	[8, 7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9,...	ACCOACORAT_LR	crtsl	hcsnact	15	ACCOACORAT	...	6.0	1.0	3.0	97.74	C23H28O6	True	True	nan	1.0	400.1886:76.7392;401.1919:19.5123;402.1953:3.3...
4	CC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C(=O)CC21C	O=O.NC(=O)C1=CN(C2OC(COP(=O)([O-])OP(=O)([O-])...	0.729	0.707	[7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9, 10]	HMR_1990_LR	crtsl	M00603	15	HMR_1990	...	4.0	1.0	1.0	71.44	C21H26O4	True	True	nan	1.0	342.1831:78.7921;343.1865:18.2518;344.1898:2.6...

5 rows × 36 columns

This function plots the isotopic mass distribution of a given data frame using plotly.

[20]:

plot_isotopic_masses(masses,names_col='metabolite_id',mass_distribution_col='mass_distribution')

PubChem identifiers

The function iterates over the rows of the data frame and uses the pubchempy library to query the PubChem database for compounds that match the identifier in the specified column and namespace. It extracts the CIDs, SIDs and synonyms of the matching compounds and stores them in the corresponding columns of the data frame.

This function sends an online request to the PubChem server in order to conduct the search. When performing this task, make sure you have an internet connection. Metabolite annotation could be a time-consuming process. It is best not to conduct searches on large numbers of molecules.

[21]:

search_pubchem(masses,entry_col='main_product_smiles')

INFO: 'PUGREST.NotFound'
INFO: 'PUGREST.NotFound'
INFO: 'PUGREST.NotFound'
INFO: 'PUGREST.NotFound'
INFO: 'PUGREST.NotFound'
INFO: 'PUGREST.NotFound'

[21]:

	main_product_smiles	secondary_products_smiles	similarity_substrates	similarity_products	reacting_atoms_in_query	reaction_id	substrate	product	num_atoms	bigg_reaction	...	TPSA	MolFormula	Lipinski	Veber	Brenk	probability_sum	mass_distribution	PubChem_CID	PubChem_SID	PubChem_Synonyms
0	CC12C=CC(=O)C=C1CCC1C2C(O)CC2(C)C1CCC2(O)C(=O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.867	0.868	[24, 18, 17, 16, 15, 14, 12, 13, 11, 9, 10, 0,...	HSD11B1r_LR	cortsn	crtsl	15	HSD11B1r	...	94.83	C21H28O5	True	True	nan	1.0	360.1937:78.5875;361.197:18.2524;362.2004:2.83...	4894	4500855;7996730;8153012	11,17,21-Trihydroxypregna-1,4-diene-3,20-dione...
1	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(=O...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.741	0.752	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 0, 1, ...	HMR_1988_LR	17ahprgstrn	11docrtsl	18	HMR_1988	...	111.90	C21H26O6	True	True	het-C-het_not_in_ring	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...	86210504
2	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C(=O)CCC12	O=C([O-])CO.NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])...	0.741	0.705	[10, 9, 11, 12, 14, 15, 16, 17, 18, 24, 0, 1, ...	RE1096M_LR	17ahprgstrn	andrstndn	18	RE1096M	...	51.21	C19H22O3	True	True	nan	1.0	298.1569:80.7305;299.1603:16.8865;300.1636:2.1...	522664	4479565;8672027;10535559	MLS002694507;CHEMBL1877745;HMS3086E21;AKOS0243...
3	CC(=O)OCC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C...	CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1OC(N2CNc3c(...	0.729	0.761	[8, 7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9,...	ACCOACORAT_LR	crtsl	hcsnact	15	ACCOACORAT	...	97.74	C23H28O6	True	True	nan	1.0	400.1886:76.7392;401.1919:19.5123;402.1953:3.3...	539225	7698251;8675577;39364481	MLS002638169;CHEMBL1715582;DTXSID10871607;HMS3...
4	CC(=O)C1(O)CCC2C3CCC4=CC(=O)C=CC4(C)C3C(=O)CC21C	O=O.NC(=O)C1=CN(C2OC(COP(=O)([O-])OP(=O)([O-])...	0.729	0.707	[7, 5, 6, 3, 4, 2, 1, 0, 15, 14, 12, 11, 9, 10]	HMR_1990_LR	crtsl	M00603	15	HMR_1990	...	71.44	C21H26O4	True	True	nan	1.0	342.1831:78.7921;343.1865:18.2518;344.1898:2.6...	72395525
5	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(OC1OC(...	O=c1ccn(C2OC(COP(=O)([O-])OP(=O)([O-])[O-])C(O...	0.581	0.771	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 3, 4, ...	UGT1A4r_LR	tststerone	tststeroneglc	15	UGT1A4r	...	190.72	C27H33O11-	False	False	nan	1.0	532.195:72.6296;533.1984:21.7896;534.2017:4.79...	None
6	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(O)C=O		0.578	0.596	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 3, 5, ...	DOCTNKI_LR	11docrtstrn	M00041	18	DOCTNKI	...	91.67	C21H26O5	True	True	aldehyde	1.0	358.178:78.6055;359.1814:18.2385;360.1847:2.82...	74066286
7	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C(C(=O)CO)CCC12	O=O.NC(=O)C1=CN(C2OC(COP(=O)([O-])OP(=O)([O-])...	0.698	0.627	[7, 5, 6, 3, 2, 1, 0, 15, 14, 12, 11, 9, 10]	HMR_1991_LR	M00603	M00285	14	HMR_1991	...	71.44	C21H26O4	True	True	nan	1.0	342.1831:78.7921;343.1865:18.2518;344.1898:2.6...	14754645	323233180;384695406;399024121	SCHEMBL20827665
8	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(OS(=O)...	Nc1ncnc2-c1NCN2C1OC(COP(=O)([O-])[O-])C(OP(=O)...	0.581	0.609	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 3, 4, ...	TSTSTERONESULT_LR	tststerone	tststerones	15	TSTSTERONESULT	...	135.04	C21H26O8S	True	True	Sulfonic_acid_2	1.0	438.1348:74.23;439.1382:17.9023;440.1416:6.640...	None
9	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(O)C(O)CO	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.518	0.571	[7, 5, 6, 3, 2, 1, 0, 15, 16, 17, 18, 24, 14, ...	AKR1C1_LR	prgstrn	aprgstrn	17	AKR1C1	...	94.83	C21H28O5	True	True	nan	1.0	360.1937:78.5875;361.197:18.2524;362.2004:2.83...	3382688	4502144;36229691;75263968	AKOS024419641;FT-0670023;17-ALPHA,20-BETA,21-T...
10	CC12C=CC(=O)C=C1CC(O)C1C2C(=O)CC2(C)C1CCC2(O)C...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.781	0.711	[14, 15, 16, 17, 18, 24]	P45011B12m_LR	11docrtsl	crtsl	6	P45011B12m	...	111.90	C21H26O6	True	True	nan	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...	None
11	CC12C=CC(=O)C=C1C(O)CC1C2C(=O)CC2(C)C1CCC2(O)C...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.781	0.710	[24, 18, 17, 16, 15, 14]	P45011B12m_LR	11docrtsl	crtsl	6	P45011B12m	...	111.90	C21H26O6	True	True	nan	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...	None
12	CC12C=CC(=O)C=C1CCC1C2C(=O)C(O)C2(C)C1CCC2(O)C...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.781	0.702	[0, 9, 11, 12, 14, 15]	P45011B12m_LR	11docrtsl	crtsl	6	P45011B12m	...	111.90	C21H26O6	True	True	nan	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...	None
13	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(CO)C1CCC2(O)C(=...	NC(=O)c1ccc[n+](C2OC(COP(=O)([O-])OP(=O)([O-])...	0.617	0.625	[15, 14, 12, 11, 9, 10, 3, 2, 1, 0]	HMR_2010_LR	crtstrn	M00429	10	HMR_2010	...	111.90	C21H26O6	True	True	nan	1.0	374.1729:78.4197;375.1763:18.2252;376.1797:2.9...	None
14	CC12C=CC(=O)C=C1CCC1C2C(=O)CC2(C)C1CCC2(OC1OC(...	O=c1ccn(C2OC(COP(=O)([O-])OP(=O)([O-])[O-])C(O...	0.362	0.673	[24, 18, 17, 16, 15, 14, 12, 11, 9, 10, 3, 4, ...	UGT1A9r_LR	5adtststerone	5adtststeroneglc	15	UGT1A9r	...	187.89	C27H34O11	False	False	nan	1.0	534.2101:72.6213;535.2135:21.7955;536.2168:4.7...	None

15 rows × 39 columns

Molecular classification

Classify molecules based on their SMILES strings.

This function submits a query to the ClassyFire web service and returns a data frame with the classification results. This function sends an online request to the ClassyFire server in order to conduct the search. When performing this task, make sure you have an internet connection.

[27]:

classification=classify_molecules(masses,smiles_col='main_product_smiles',names_col='metabolite_id')
classification

Display molecules and data

This function displays a grid of molecules from a data frame, using different colors to indicate the values of a specified column.

[26]:

display_molecules(masses,columns_to_display=['reaction_id','PubChem_CID',"origin","Lipinski","Veber","Brenk"])#,'kingdom','superclass','class','direct_parent','description'])

[26]:

[ ]: