microberx.MetaboliteVisualizer

This is a module that provides functions to visualize predicted metabolites from MicrobeRX.

The module contains the following functions:

plot_molecular_descriptors: Generated and interactive of the molecular descriptors of a given data frame using polar coordinates.
plot_boiled_egg: Plots the boiled egg diagram of a given data frame using scatter plot.
plot_isotopic_masses: Generated and interactive plot of the isotopic mass distribution of a given data frame using plotly.
plot_confidence_scores: Creates a 3D scatter plot of the data frame with the x, y, and z axes representing the similarity of substrates, products, and reacting atoms efficiency respectively.
plot_metabolic_accesibility: Creates a 2D image of a molecule with the atoms colored according to their metabolic accessibility.
plot_relationships: Creates a Sankey diagram to visualize the evidences of metabolite annotations in a data frame.
display_molecules: Displays a grid of molecules from a data frame, using different colors to indicate the values of a specified column.

Functions

`plot_molecular_descriptors`(data_frame, names_col)	Plots the molecular descriptors of a given data frame using polar coordinates.
`plot_boiled_egg`(data_frame, names_col)	Plots the boiled egg diagram of a given data frame using scatter plot.
`plot_isotopic_masses`(data_frame, names_col, ...)	Plots the isotopic mass distribution of a given data frame using plotly.
`plot_confidence_scores`(data_frame[, x, y, z, cmap])	Creates a 3D scatter plot of the data frame with the x, y, and z axes representing the similarity of substrates, products, and reacting atoms efficiency respectively.
`plot_metabolic_accesibility`(data_frame, molecule[, ...])	Creates a 2D image of a molecule with the atoms colored according to their metabolic accessibility.
`display_molecules`(data_frame[, legends_col, ...])	Displays a grid of molecules from a data frame, using different colors to indicate the values of a specified column.
`plot_relationships`(data_frame[, nodes])	Creates a Sankey diagram to visualize the evidences of metabolite annotations in a data frame.

Module Contents

microberx.MetaboliteVisualizer.plot_molecular_descriptors(data_frame, names_col)[source]

Plots the molecular descriptors of a given data frame using polar coordinates.

Parameters:

data_frame (pd.DataFrame) – A pandas data frame that contains the molecular descriptors as columns and the compound names as rows.
names_col (str) – A string that specifies the name of the column that contains the compound names.

Returns:

Figure –

A plotly figure object that shows the polar plot of the molecular descriptors. The plot has the following features:

The radial axis represents the normalized value of each molecular descriptor, ranging from 0 to 1.
The angular axis represents the different molecular descriptors, such as MolWt, LogP, NumHAcceptors, etc.
Each compound is plotted as a radial line with a distinct color and a marker at each descriptor value.
The upper and lower limits of the Lipinski’s rule of five are plotted as shaded regions in orange and yellow, respectively. The rule of five states that most drug-like molecules have molecular weight less than 500, LogP less than 5, number of hydrogen bond acceptors less than 10, and number of hydrogen bond donors less than 5.
A legend is displayed on the right side of the plot, showing the name and color of each compound.

Return type:

plotly.graph_objects.Figure

microberx.MetaboliteVisualizer.plot_boiled_egg(data_frame, names_col)[source]

Plots the boiled egg diagram of a given data frame using scatter plot.

Parameters:

data_frame (pd.DataFrame) – A pandas data frame that contains the TPSA and LogP values as columns and the compound names as rows.
names_col (str) – A string that specifies the name of the column that contains the compound names.

Returns:

Figure –

A plotly figure object that shows the scatter plot of the TPSA and LogP values. The plot has the following features:

The x-axis represents the topological polar surface area (TPSA) of each compound, ranging from 0 to 142.
The y-axis represents the octanol-water partition coefficient (LogP) of each compound, ranging from -2.3 to 6.8.
Each compound is plotted as a red dot with its name displayed in the hover.
The human intestinal absorption (HIA) and blood-brain barrier (BBB) regions are plotted as white and orange circles, respectively. The HIA region indicates the compounds that are likely to be absorbed by the human intestine, while the BBB region indicates the compounds that are likely to cross the blood-brain barrier.

Return type:

plotly.graph_objects.Figure

microberx.MetaboliteVisualizer.plot_isotopic_masses(data_frame, names_col, mass_distribution_col)[source]

Plots the isotopic mass distribution of a given data frame using plotly.

Parameters:

data_frame (pd.DataFrame) – A pandas data frame that contains the isotopic mass distribution as a column of strings, where each string has the format ‘mass:probability;mass:probability;…’
names_col (str) – A string that specifies the name of the column that contains the compound names.
mass_distribution_col (str) – A string that specifies the name of the column that contains the isotopic mass distribution.

Returns:

Figure –

A plotly figure object that shows the bar plot of the isotopic mass distribution for each compound. The plot has the following features:

The x-axis represents the mass values of the isotopes, rounded to four decimal places.
The y-axis represents the probability values of the isotopes, multiplied by 100 and rounded to four decimal places.
Each compound is plotted as a group of bars with a distinct color and a label at the top of each bar.
A legend is displayed on the right side of the plot, showing the name and color of each compound.

Return type:

plotly.graph_objects.Figure

microberx.MetaboliteVisualizer.plot_confidence_scores(data_frame, x='similarity_substrates', y='similarity_products', z='reacting_atoms_efficiency', cmap='RdYlGn')[source]

Creates a 3D scatter plot of the data frame with the x, y, and z axes representing the similarity of substrates, products, and reacting atoms efficiency respectively.

Parameters:

data_frame (pd.DataFrame) – The data frame containing the columns ‘similarity_substrates’, ‘similarity_products’, ‘reacting_atoms_efficiency’, ‘confidence_score’, and ‘metabolite_id’.
x (str, optional) – The name of the column to use as the x-axis. Defaults to ‘similarity_substrates’.
y (str, optional) – The name of the column to use as the y-axis. Defaults to ‘similarity_products’.
z (str, optional) – The name of the column to use as the z-axis. Defaults to ‘reacting_atoms_efficiency’.
cmap (str, optional) – The name of the color map to use for the color scale. Defaults to ‘RdYlGn’.

Returns:

The 3D scatter plot figure. The figure has the following features:

The x-axis represents the similarity of substrates, ranging from 0 to 1.
The y-axis represents the similarity of products, ranging from 0 to 1.
The z-axis represents the reacting atoms efficiency, ranging from 0 to 1.
The color of each point indicates the confidence score of the corresponding metabolite id, ranging from 0 to 1. A color bar is displayed on the right side of the plot.
The hover text of each point shows the metabolite id and the values of x, y, z, and color.
The title of the plot shows the names of the columns used for x, y, z, and color.

Return type:

plotly.Figure

microberx.MetaboliteVisualizer.plot_metabolic_accesibility(data_frame, molecule, atom_map_col='reacting_atoms_in_query', mol_name='Query', alpha=0.5, cmap='RdYlGn_r')[source]

Creates a 2D image of a molecule with the atoms colored according to their metabolic accessibility.

Parameters:

data_frame (pd.DataFrame) – The data frame containing the column with the atom map information.
molecule (Chem.Mol) – The molecule object to be drawn.
atom_map_col (str, optional) – The name of the column with the atom map information. The column should contain lists of integers representing the atom indices. Defaults to ‘reacting_atoms_in_query’.
mol_name (str, optional) – The name of the molecule to be displayed on the image. Defaults to ‘Query’.
alpha (float, optional) – The transparency level of the atom colors, ranging from 0 to 1. Defaults to 0.5.
cmap (str, optional) – The name of the color map to use for the color scale. Defaults to ‘RdYlGn_r’.

Returns:

The 2D image figure. The figure has the following features:

The molecule is drawn in a 2D projection with the atom symbols and bond types shown.
The atoms are colored according to their metabolic accessibility, which is calculated as the frequency of the atom in the atom map column of the data frame. The color scale ranges from red (low accessibility) to green (high accessibility).
A color bar is displayed on the right side of the image, showing the values of the metabolic accessibility.
The name of the molecule is displayed on the top left corner of the image.

Return type:

matplotlib.Figure

microberx.MetaboliteVisualizer.display_molecules(data_frame, legends_col='metabolite_id', smiles_col='main_product_smiles', scale_from_column='confidence_score', columns_to_display=['reaction_id'], cmap='RdYlGn')[source]

Displays a grid of molecules from a data frame, using different colors to indicate the values of a specified column.

Parameters:

data_frame (pd.DataFrame) – A pandas data frame containing the molecular data.
legends_col (str, optional) – The name of the column to use as the legend for each molecule. Default is ‘metabolite_id’.
smiles_col (str, optional) – The name of the column containing the SMILES strings for each molecule. Default is ‘main_product_smiles’.
scale_from_column (str, optional) – The name of the column to use for scaling the colors of the molecules. Default is ‘confidence_score’.
columns_to_display (list, optional) – A list of column names to display as tooltips when hovering over the molecules. Default is [‘reaction_id’].
cmap (str, optional) – The name of the matplotlib colormap to use for coloring the molecules. Default is ‘RdYlGn’.

Returns:

A mols2grid display object that shows the grid of molecules with legends, colors and tooltips. The display object has the following features:

Each molecule is drawn in a 2D projection with the atom symbols and bond types shown.
The legend of each molecule is displayed below the image, using the value from the legends_col column.
The color of each molecule is determined by the value from the scale_from_column column, using the cmap colormap. A color bar is displayed on the top right corner of the grid, showing the range of values.
The tooltip of each molecule is displayed when hovering over the image, showing the values from the columns_to_display list.
The grid can be filtered, sorted and searched by using the widgets on the top left corner of the grid.

Return type:

mols2grid.display

microberx.MetaboliteVisualizer.plot_relationships(data_frame, nodes=['reaction_id', 'metabolite_id'])[source]

Creates a Sankey diagram to visualize the evidences of metabolite annotations in a data frame.

Parameters:

data_frame (pd.DataFrame) – A pandas data frame that contains the metabolite annotations and their evidences.
nodes (list, optional) – A list of column names that represent the nodes of the Sankey diagram. The default value is [‘reaction_id’, ‘metabolite_id’].

Returns:

A plotly figure object that contains the Sankey diagram. The diagram has the following features:

The nodes are arranged horizontally from left to right, corresponding to the order of the columns in the nodes list.
The links are drawn as curved lines connecting the nodes, representing the flow of evidences from one node to another.
The width of each link is proportional to the number of evidences for that pair of nodes.
The color of each link is determined by the color of the source node, using a distinct color for each node.
The label of each node is displayed on top of the node, using the value from the corresponding column in the data frame.
The tooltip of each link shows the source and target node names and the number of evidences for that link.
The title of the diagram shows the names of the columns used for the nodes.

Return type:

plotly.Figure