OmicsIntegrator

Loading the modules to work with omics data of MicrobeRX

Importing the omics integration module of MicrobeRX

[1]:
from microberx.OmicsIntegrator import plot_species_sunburst, plot_species_sunburst, fetch_batch_sequences, get_interpro, plot_interpro_results, run_multi_sequence_aligment, plot_similarity_matrix, plot_aligment_chart

import pandas as pd

WARNING:

Since the OmicsIntegrator is designed to perform focused analysis of organisms and sequences rather than larger analysis, working with it locally would require more than 35 GB of data, and requesting multiple analyses can result in a significant workload and processing times.

Additionally, the majority of the analysis is carried out using online services for accessibility.

Identification of organisms involved in the biotransformations

MicrobeRX returns the information on the biotransformation utilized for the prediction from each prediction, and each reaction_id may be mapped to the organisms in the GEMs where such reaction has been detected, and such organisms can be individually or visualized for a series of biotransofmrations.

The first output of plot_species_sunburst is an interactive figure displaying various taxonomic levels of the species.

[2]:
sources=['HMR_1951','D4OR','ACCOACORAT']

sequence_ids, F = plot_species_sunburst(sources=sources)

F
INFO: Loading microbes data...
INFO: Loading microbes reactions...

The second output is a dataframe containing the Entrez identifiers of each organism’s enzymes.

[3]:
sequence_ids.head(10)
[3]:
HMR_1951 D4OR ACCOACORAT
Actinobacillus_pleuropneumoniae_S8 NaN NaN WP_005605423.1
Actinobacillus_pleuropneumoniae_serovar_10_str_D13039 NaN NaN WP_005605423.1
Actinobacillus_pleuropneumoniae_serovar_11_str_56153 NaN NaN WP_005605423.1
Actinobacillus_pleuropneumoniae_serovar_12_str_1096 NaN NaN WP_005605423.1
Actinobacillus_pleuropneumoniae_serovar_13_str_N273 NaN NaN WP_005605423.1
Actinobacillus_pleuropneumoniae_serovar_1_str_4074 NaN NaN WP_005598772.1
Actinobacillus_pleuropneumoniae_serovar_2_str_4226 NaN NaN WP_005602179.1
Actinobacillus_pleuropneumoniae_serovar_2_str_S1536 NaN NaN WP_005602179.1
Actinobacillus_pleuropneumoniae_serovar_3_str_JL03 NaN NaN WP_005602179.1
Actinobacillus_pleuropneumoniae_serovar_4_str_M62 NaN NaN WP_005605423.1

We advice sampling protein sequences because the number of sequences and organims involved in a biotransformation can be large. This cell provides a basic technique of collecting 30 random identifiers.

[4]:
ACCOACORAT=sequence_ids.drop_duplicates(subset=['ACCOACORAT'])[['ACCOACORAT']]
sample=ACCOACORAT.sample(n=30)
sample
[4]:
ACCOACORAT
Brevibacillus_brevis_FJAT_0809_GLX WP_016742856.1
Helicobacter_pylori_UM299 WP_015645908.1
Clostridium_botulinum_Ba4_str_657 WP_003360499.1
Helicobacter_pylori_Hp_H_43 WP_000886341.1
Helicobacter_pylori_CPY3281 WP_000886313.1
Staphylococcus_warneri_L37603 WP_002451204.1
Helicobacter_pylori_HLJHP271 WP_017283306.1
Listeria_monocytogenes_08_5578 WP_012951129.1
Escherichia_coli_B185 WP_001277565.1
Streptococcus_agalactiae_2603V_R NP_687240.1
Mycobacterium_avium_104 WP_011724562.1
Streptococcus_parasanguinis_SK236 WP_003013745.1
Morganella_morganii_SC01 WP_004240115.1
Citrobacter_freundii_str_ballerup_7851_39 WP_016154965.1
Streptococcus_parauberis_KCTC_11980BP WP_003104753.1
Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953 WP_005898109.1
Enterobacter_cloacae_subsp_dissolvens_SDM WP_014830172.1
Campylobacter_jejuni_subsp_jejuni_1854 WP_002928246.1
Helicobacter_pylori_Hp_A_14 WP_000886353.1
Helicobacter_pylori_UM037 WP_015643831.1
Streptococcus_mutans_11A1 WP_002274416.1
Helicobacter_pylori_8A3 WP_000886326.1
Bacteroides_salyersiae_CL02T12C01 WP_007478489.1
Escherichia_coli_DEC7A WP_001277567.1
Streptococcus_vestibularis_ATCC_49124 WP_003092303.1
Actinobacillus_pleuropneumoniae_serovar_1_str_4074 WP_005598772.1
Staphylococcus_lugdunensis_ACS_027_V_Sch2 WP_002459950.1
Gemella_haemolysans_M341 WP_003147389.1
Campylobacter_coli_202_04 WP_002823200.1
Vibrio_parahaemolyticus_AN_5034 WP_005456939.1

Retriving sequences from online databases

The fetch_batch_sequences function allows you to map and retrieve protein sequences using the Entrez request system; MicrobeRX uses this approach with BioPython to collect the sequences and make them available for manipulation.

[5]:
ACCOACORAT_sequences=fetch_batch_sequences(entries=sample.ACCOACORAT,sequence_ids=sample.index,email='my_mail@mail.com')
ACCOACORAT_sequences
[5]:
[SeqRecord(seq=Seq('MLAQMRDDIHAVFERDPAARSTLEVVMTYSGLHAIWGHRIAHRLWKAELCTLAR...SVD'), id='Brevibacillus_brevis_FJAT_0809_GLX', name='WP_016742856.1', description='WP_016742856.1 MULTISPECIES: serine O-acetyltransferase [Bacillales]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_UM299', name='WP_015645908.1', description='WP_015645908.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MKNPFKVLIYDLKNAKEKDPAARNILEVFILYPFIHALIGYRIAHLFYKAHLFF...MII'), id='Clostridium_botulinum_Ba4_str_657', name='WP_003360499.1', description='WP_003360499.1 serine O-acetyltransferase EpsC [Clostridium botulinum]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_Hp_H_43', name='WP_000886341.1', description='WP_000886341.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKKGFYFIARA...KDR'), id='Helicobacter_pylori_CPY3281', name='WP_000886313.1', description='WP_000886313.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLRRMRDDIKMVFEQDPAARSTIEVVTTYAGLHAVWSHLIAHKLYNNQRYVAAR...YII'), id='Staphylococcus_warneri_L37603', name='WP_002451204.1', description='WP_002451204.1 MULTISPECIES: serine O-acetyltransferase [Bacteria]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKKGFYFIARA...KDR'), id='Helicobacter_pylori_HLJHP271', name='WP_017283306.1', description='WP_017283306.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MPTRLKEDIATIIKNDPATKSFFDAFLTNPGLHALWWHRMANFFYRHKMVLFGK...EKE'), id='Listeria_monocytogenes_08_5578', name='WP_012951129.1', description='WP_012951129.1 serine O-acetyltransferase EpsC [Listeria monocytogenes]', dbxrefs=[]),
 SeqRecord(seq=Seq('MSCEELEIVWNNIKAEARTLADCEPMLASFYHATLLKHENLGSALSYMLANKLS...DGI'), id='Escherichia_coli_B185', name='WP_001277565.1', description='WP_001277565.1 MULTISPECIES: serine O-acetyltransferase [Enterobacteriaceae]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGWWKESIAIVKEQDPAARSSLEVILTYPGIKALAAHRLSHFLWNHNFKLLARM...SKL'), id='Streptococcus_agalactiae_2603V_R', name='NP_687240.1', description='NP_687240.1 serine O-acetyltransferase [Streptococcus agalactiae 2603V/R]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLAAIRRDIRAARERDPAAPTTLQVIFAYPGVHAIWGHRVNHWLWRRGARLAAR...FSI'), id='Mycobacterium_avium_104', name='WP_011724562.1', description='WP_011724562.1 MULTISPECIES: serine O-acetyltransferase [Mycobacterium avium complex (MAC)]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGWWRETIDIVKENDPAARTSLEVLLTYPGVKALAAHRVSHFLWNHGFKLLARM...SSL'), id='Streptococcus_parasanguinis_SK236', name='WP_003013745.1', description='WP_003013745.1 MULTISPECIES: serine O-acetyltransferase [Streptococcus]', dbxrefs=[]),
 SeqRecord(seq=Seq('MSYEELEEVWSYIKQEARDFADCEPMLASFFHATLLKHENLGSALSFMLANKLA...NDI'), id='Morganella_morganii_SC01', name='WP_004240115.1', description='WP_004240115.1 serine O-acetyltransferase [Morganella morganii]', dbxrefs=[]),
 SeqRecord(seq=Seq('MPCEELDIVWKNIKAEARALAECEPMLASFYHATLLKHENLGSALSYMLANKLA...DGI'), id='Citrobacter_freundii_str_ballerup_7851_39', name='WP_016154965.1', description='WP_016154965.1 MULTISPECIES: serine O-acetyltransferase [Citrobacter]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGWWKKSIDIIKEKDPAARSSLEIILTYPGLKALAAHQLSHWMWQKNLKLLARM...SKL'), id='Streptococcus_parauberis_KCTC_11980BP', name='WP_003104753.1', description='WP_003104753.1 serine O-acetyltransferase [Streptococcus parauberis]', dbxrefs=[]),
 SeqRecord(seq=Seq('MNIFSWFKDEFLNIQKKDPAIKSKLEIILYPSLHAIIYHKLAHFLYKCKLFFLA...VKN'), id='Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953', name='WP_005898109.1', description='WP_005898109.1 MULTISPECIES: serine O-acetyltransferase EpsC [Fusobacterium]', dbxrefs=[]),
 SeqRecord(seq=Seq('MPCEELDIVWNNIKAEARALADCEPMLASFYHATLLKHENLGSALSYMLANKLA...DGI'), id='Enterobacter_cloacae_subsp_dissolvens_SDM', name='WP_014830172.1', description='WP_014830172.1 MULTISPECIES: serine O-acetyltransferase [Enterobacter]', dbxrefs=[]),
 SeqRecord(seq=Seq('MNFWGIIKEDFSQPKVQDPAFNSCIELFFNYPGVWAVVNYRFAHFFYIRNFKRI...SQK'), id='Campylobacter_jejuni_subsp_jejuni_1854', name='WP_002928246.1', description='WP_002928246.1 serine O-acetyltransferase [Campylobacter jejuni]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRRFYFIARA...KDR'), id='Helicobacter_pylori_Hp_A_14', name='WP_000886353.1', description='WP_000886353.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRRFYFIARA...KDR'), id='Helicobacter_pylori_UM037', name='WP_015643831.1', description='WP_015643831.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGWWKETIDIVKEKDPAARTALEVLLTYPGVKALAAHCLSHFLWTHHCKLLARM...SGL'), id='Streptococcus_mutans_11A1', name='WP_002274416.1', description='WP_002274416.1 serine O-acetyltransferase [Streptococcus mutans]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_8A3', name='WP_000886326.1', description='WP_000886326.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
 SeqRecord(seq=Seq('MKDIAIYGFGGFGREIACVINAINQATPTWNFIGYFDDGHSVGEANKYGRVLGN...RIQ'), id='Bacteroides_salyersiae_CL02T12C01', name='WP_007478489.1', description='WP_007478489.1 acetyltransferase [Bacteroides salyersiae]', dbxrefs=[]),
 SeqRecord(seq=Seq('MSCEELEIVWNNIKAEARTLADCEPMLASFYHATLLKHENLGSALSYMLANKLS...DGI'), id='Escherichia_coli_DEC7A', name='WP_001277567.1', description='WP_001277567.1 serine O-acetyltransferase [Escherichia coli]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGWWKESIDIVKKNDPAARTSLEVLLTYPGLKALAAHRISHFLWRHHCRLLARM...SRL'), id='Streptococcus_vestibularis_ATCC_49124', name='WP_003092303.1', description='WP_003092303.1 MULTISPECIES: serine O-acetyltransferase [Streptococcus]', dbxrefs=[]),
 SeqRecord(seq=Seq('MNESELNQIWKNIREEAEELVDNEPMLASFFHATILKHSNLGGSLSYILANKLA...DGI'), id='Actinobacillus_pleuropneumoniae_serovar_1_str_4074', name='WP_005598772.1', description='WP_005598772.1 serine O-acetyltransferase [Actinobacillus pleuropneumoniae]', dbxrefs=[]),
 SeqRecord(seq=Seq('MLRRMRDDVKMVFEQDPAARTTFEVITTYAGLHAVWSHLIAHKLYNKKRYVLAR...YII'), id='Staphylococcus_lugdunensis_ACS_027_V_Sch2', name='WP_002459950.1', description='WP_002459950.1 MULTISPECIES: serine O-acetyltransferase [Staphylococcus]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGYFENLNYNLNRVLKDDPAAESKLMIYLTYPHIKALNYHYFSHKLYKKGWHTM...EDK'), id='Gemella_haemolysans_M341', name='WP_003147389.1', description='WP_003147389.1 serine O-acetyltransferase [Gemella haemolysans]', dbxrefs=[]),
 SeqRecord(seq=Seq('MGFFGIIKEDFSQPKVQDPAYNSCIELFFNYPGVWAVVNYRFAHFFYTKNFKRT...DIK'), id='Campylobacter_coli_202_04', name='WP_002823200.1', description='WP_002823200.1 serine O-acetyltransferase [Campylobacter coli]', dbxrefs=[]),
 SeqRecord(seq=Seq('MKHCEKQKVWNKIVSEAREMSEQEPMLASFYHATIIKHESLCAALSYILANKLN...DGI'), id='Vibrio_parahaemolyticus_AN_5034', name='WP_005456939.1', description='WP_005456939.1 serine O-acetyltransferase [Vibrio parahaemolyticus]', dbxrefs=[])]

Sequences can be easily saved to different formats uisng BioPython

[6]:
from Bio import SeqIO

SeqIO.write(ACCOACORAT_sequences,handle='ACCOACORAT.faa',format="fasta")
[6]:
30

Multi Sequence Aligment (MSA)

A common analysis for all sequences is to compare their sequence similarity using ClustalW. The function run_multi_sequence_aligment performs the MSA, saving the aligment results, a pyhilogenetic tree, and displaying the similarity matrix that can be used for further analysis.

[7]:
similarity_matrix=run_multi_sequence_aligment('ACCOACORAT.faa',input_format='fasta',output_aligment_format='fasta')
similarity_matrix
MSA and Phylogenetic tree have been saven in ACCOACORAT.faa directory
[7]:
Brevibacillus_brevis_FJAT_0809_GLX Helicobacter_pylori_UM299 Clostridium_botulinum_Ba4_str_657 Helicobacter_pylori_Hp_H_43 Helicobacter_pylori_CPY3281 Staphylococcus_warneri_L37603 Helicobacter_pylori_HLJHP271 Listeria_monocytogenes_08_5578 Escherichia_coli_B185 Streptococcus_agalactiae_2603V_R ... Streptococcus_mutans_11A1 Helicobacter_pylori_8A3 Bacteroides_salyersiae_CL02T12C01 Escherichia_coli_DEC7A Streptococcus_vestibularis_ATCC_49124 Actinobacillus_pleuropneumoniae_serovar_1_str_4074 Staphylococcus_lugdunensis_ACS_027_V_Sch2 Gemella_haemolysans_M341 Campylobacter_coli_202_04 Vibrio_parahaemolyticus_AN_5034
Brevibacillus_brevis_FJAT_0809_GLX 59.0 100.0 57.0 58.0 59.0 62.0 59.0 46.0 34.0 44.0 ... 41.0 59.0 8.0 34.0 41.0 33.0 61.0 49.0 44.0 33.0
Helicobacter_pylori_UM299 100.0 59.0 63.0 97.0 98.0 55.0 98.0 53.0 42.0 50.0 ... 48.0 99.0 17.0 42.0 49.0 40.0 52.0 54.0 45.0 40.0
Clostridium_botulinum_Ba4_str_657 63.0 57.0 100.0 63.0 63.0 51.0 63.0 47.0 33.0 49.0 ... 45.0 64.0 16.0 33.0 47.0 33.0 51.0 53.0 47.0 34.0
Helicobacter_pylori_Hp_H_43 97.0 58.0 63.0 100.0 97.0 54.0 97.0 53.0 42.0 49.0 ... 46.0 97.0 18.0 42.0 47.0 40.0 52.0 53.0 45.0 40.0
Helicobacter_pylori_CPY3281 98.0 59.0 63.0 97.0 100.0 54.0 98.0 52.0 41.0 49.0 ... 47.0 97.0 17.0 41.0 48.0 40.0 52.0 55.0 46.0 39.0
Staphylococcus_warneri_L37603 55.0 62.0 51.0 54.0 54.0 100.0 55.0 45.0 33.0 43.0 ... 39.0 55.0 9.0 33.0 38.0 33.0 91.0 48.0 43.0 31.0
Helicobacter_pylori_HLJHP271 98.0 59.0 63.0 97.0 98.0 55.0 100.0 52.0 42.0 50.0 ... 47.0 97.0 22.0 42.0 49.0 40.0 53.0 54.0 46.0 40.0
Listeria_monocytogenes_08_5578 53.0 46.0 47.0 53.0 52.0 45.0 52.0 100.0 34.0 47.0 ... 46.0 53.0 16.0 34.0 48.0 31.0 45.0 44.0 42.0 35.0
Escherichia_coli_B185 42.0 34.0 33.0 42.0 41.0 33.0 42.0 34.0 100.0 36.0 ... 33.0 42.0 11.0 98.0 34.0 71.0 33.0 33.0 33.0 72.0
Streptococcus_agalactiae_2603V_R 50.0 44.0 49.0 49.0 49.0 43.0 50.0 47.0 36.0 100.0 ... 78.0 50.0 9.0 36.0 81.0 33.0 43.0 44.0 41.0 35.0
Mycobacterium_avium_104 52.0 48.0 46.0 52.0 50.0 45.0 51.0 41.0 32.0 39.0 ... 39.0 52.0 11.0 32.0 42.0 34.0 45.0 47.0 38.0 31.0
Streptococcus_parasanguinis_SK236 50.0 42.0 46.0 49.0 49.0 39.0 50.0 49.0 35.0 76.0 ... 83.0 50.0 11.0 35.0 81.0 33.0 40.0 44.0 37.0 33.0
Morganella_morganii_SC01 40.0 33.0 37.0 40.0 40.0 34.0 41.0 36.0 82.0 38.0 ... 35.0 40.0 13.0 82.0 36.0 68.0 35.0 35.0 33.0 70.0
Citrobacter_freundii_str_ballerup_7851_39 42.0 34.0 33.0 42.0 41.0 32.0 42.0 34.0 95.0 36.0 ... 33.0 42.0 11.0 95.0 33.0 72.0 33.0 33.0 33.0 72.0
Streptococcus_parauberis_KCTC_11980BP 49.0 44.0 49.0 47.0 49.0 42.0 49.0 47.0 34.0 84.0 ... 75.0 49.0 11.0 34.0 77.0 32.0 42.0 45.0 39.0 34.0
Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953 53.0 48.0 57.0 52.0 52.0 46.0 52.0 48.0 34.0 47.0 ... 43.0 53.0 18.0 34.0 44.0 38.0 47.0 50.0 48.0 30.0
Enterobacter_cloacae_subsp_dissolvens_SDM 43.0 35.0 34.0 43.0 42.0 33.0 43.0 34.0 96.0 36.0 ... 33.0 43.0 11.0 95.0 34.0 71.0 33.0 34.0 34.0 71.0
Campylobacter_jejuni_subsp_jejuni_1854 46.0 38.0 46.0 46.0 46.0 39.0 46.0 42.0 31.0 42.0 ... 39.0 47.0 11.0 31.0 35.0 33.0 40.0 41.0 86.0 32.0
Helicobacter_pylori_Hp_A_14 97.0 58.0 63.0 99.0 96.0 54.0 96.0 53.0 41.0 49.0 ... 46.0 96.0 19.0 41.0 47.0 40.0 52.0 53.0 45.0 39.0
Helicobacter_pylori_UM037 97.0 59.0 64.0 98.0 97.0 54.0 97.0 52.0 42.0 49.0 ... 47.0 97.0 18.0 42.0 48.0 40.0 52.0 53.0 46.0 40.0
Streptococcus_mutans_11A1 48.0 41.0 45.0 46.0 47.0 39.0 47.0 46.0 33.0 78.0 ... 100.0 48.0 11.0 33.0 82.0 32.0 40.0 43.0 40.0 32.0
Helicobacter_pylori_8A3 99.0 59.0 64.0 97.0 97.0 55.0 97.0 53.0 42.0 50.0 ... 48.0 100.0 17.0 42.0 49.0 40.0 52.0 54.0 46.0 40.0
Bacteroides_salyersiae_CL02T12C01 17.0 8.0 16.0 18.0 17.0 9.0 22.0 16.0 11.0 9.0 ... 11.0 17.0 100.0 11.0 11.0 13.0 12.0 18.0 12.0 14.0
Escherichia_coli_DEC7A 42.0 34.0 33.0 42.0 41.0 33.0 42.0 34.0 98.0 36.0 ... 33.0 42.0 11.0 100.0 34.0 71.0 33.0 33.0 33.0 72.0
Streptococcus_vestibularis_ATCC_49124 49.0 41.0 47.0 47.0 48.0 38.0 49.0 48.0 34.0 81.0 ... 82.0 49.0 11.0 34.0 100.0 32.0 39.0 44.0 36.0 32.0
Actinobacillus_pleuropneumoniae_serovar_1_str_4074 40.0 33.0 33.0 40.0 40.0 33.0 40.0 31.0 71.0 33.0 ... 32.0 40.0 13.0 71.0 32.0 100.0 33.0 34.0 35.0 63.0
Staphylococcus_lugdunensis_ACS_027_V_Sch2 52.0 61.0 51.0 52.0 52.0 91.0 53.0 45.0 33.0 43.0 ... 40.0 52.0 12.0 33.0 39.0 33.0 100.0 48.0 43.0 32.0
Gemella_haemolysans_M341 54.0 49.0 53.0 53.0 55.0 48.0 54.0 44.0 33.0 44.0 ... 43.0 54.0 18.0 33.0 44.0 34.0 48.0 100.0 43.0 34.0
Campylobacter_coli_202_04 45.0 44.0 47.0 45.0 46.0 43.0 46.0 42.0 33.0 41.0 ... 40.0 46.0 12.0 33.0 36.0 35.0 43.0 43.0 100.0 34.0
Vibrio_parahaemolyticus_AN_5034 40.0 33.0 34.0 40.0 39.0 31.0 40.0 35.0 72.0 35.0 ... 32.0 40.0 14.0 72.0 32.0 63.0 32.0 34.0 34.0 100.0

30 rows × 30 columns

The MSA may be visualized interactively using the plot_similarity_matrix function. We also introduced a method that allows users to visualize sequence homology at a defined threshold.

[8]:
plot_similarity_matrix(similarity_matrix)

Protein families using InterPro

MicrobeRX includes the ability to perform automated family and domain annotation of protein sequences for user-selected proteins.

WARNING: For large numbers of protein sequences or whole genomes, it is better and faster to run the analysis locally using [InterProScan] (https://www.ebi.ac.uk/interpro/).

[10]:
selected_protein=ACCOACORAT_sequences[0]
selected_protein
[10]:
SeqRecord(seq=Seq('MLAQMRDDIHAVFERDPAARSTLEVVMTYSGLHAIWGHRIAHRLWKAELCTLAR...SVD'), id='Brevibacillus_brevis_FJAT_0809_GLX', name='WP_016742856.1', description='WP_016742856.1 MULTISPECIES: serine O-acetyltransferase [Bacillales]', dbxrefs=[])
[11]:
interpro=get_interpro(sequence_id=selected_protein.id,sequence=str(selected_protein.seq),email='my_mail@mail.com')
interpro
Job ID: iprscan5-R20231124-154123-0249-84797931-p1m
Job is queued, please wait...
Job is running, please wait...
Job is running, please wait...
Job is running, please wait...
[11]:
accesion token sequence_length analysis signature_accession signature_description start_location stop_location score status date interpro_accession interpro_description go_annotations pathways
0 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 FunFam G3DSA:1.10.3130.10:FF:000002 Serine acetyltransferase 1 66 6.4E-33 T 24-11-2023 - - - -
1 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 Coils Coil Coil 192 212 - T 24-11-2023 - - - -
2 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 Pfam PF00132 Bacterial transferase hexapeptide (six repeats) 117 150 9.8E-6 T 24-11-2023 IPR001451 Hexapeptide repeat - MetaCyc:PWY-2229|MetaCyc:PWY-241|MetaCyc:PWY-2...
3 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 PANTHER PTHR42811 SERINE ACETYLTRANSFERASE 3 181 8.4E-74 T 24-11-2023 - - GO:0005829(PANTHER)|GO:0009001(PANTHER) -
4 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 FunFam G3DSA:2.160.10.10:FF:000007 Serine acetyltransferase 67 181 4.2E-53 T 24-11-2023 - - - -
5 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 PIRSF PIRSF000441 CysE 1 218 2.0E-84 T 24-11-2023 IPR005881 Serine O-acetyltransferase GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4...
6 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 CDD cd03354 LbH_SAT 64 164 1.73593E-56 T 24-11-2023 IPR045304 Serine acetyltransferase, LbH domain - MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4...
7 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 SUPERFAMILY SSF51161 Trimeric LpxA-like enzymes 4 168 9.84E-55 T 24-11-2023 IPR011004 Trimeric LpxA-like superfamily - MetaCyc:PWY-2229|MetaCyc:PWY-241|MetaCyc:PWY-2...
8 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 NCBIfam NF041874 serine O-acetyltransferase EpsC 5 170 4.9E-71 T 24-11-2023 - - - -
9 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 Gene3D G3DSA:2.160.10.10 Hexapeptide repeat proteins 67 185 4.2E-37 T 24-11-2023 - - - -
10 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 NCBIfam TIGR01172 serine O-acetyltransferase 6 166 6.3E-73 T 24-11-2023 IPR005881 Serine O-acetyltransferase GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4...
11 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 Pfam PF06426 Serine acetyltransferase, N-terminal 3 35 0.041 T 24-11-2023 IPR010493 Serine acetyltransferase, N-terminal GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... MetaCyc:PWY-6936|MetaCyc:PWY-7274|MetaCyc:PWY-...
12 Brevibacillus_brevis_FJAT_0809_GLX 8c3e2ed087ff5d386d16ad5757c07175 221 Gene3D G3DSA:1.10.3130.10 serine acetyltransferase, domain 1 1 66 4.2E-24 T 24-11-2023 IPR042122 Serine acetyltransferase, N-terminal domain su... - MetaCyc:PWY-6936|MetaCyc:PWY-7274|MetaCyc:PWY-...
[12]:
f=plot_interpro_results(interpro_results=interpro,compact=True)
f