OmicsIntegrator
Loading the modules to work with omics data of MicrobeRX
Importing the omics integration module of MicrobeRX
[1]:
from microberx.OmicsIntegrator import plot_species_sunburst, plot_species_sunburst, fetch_batch_sequences, get_interpro, plot_interpro_results, run_multi_sequence_aligment, plot_similarity_matrix, plot_aligment_chart
import pandas as pd
WARNING:
Since the OmicsIntegrator is designed to perform focused analysis of organisms and sequences rather than larger analysis, working with it locally would require more than 35 GB of data, and requesting multiple analyses can result in a significant workload and processing times.
Additionally, the majority of the analysis is carried out using online services for accessibility.
Identification of organisms involved in the biotransformations
MicrobeRX returns the information on the biotransformation utilized for the prediction from each prediction, and each reaction_id may be mapped to the organisms in the GEMs where such reaction has been detected, and such organisms can be individually or visualized for a series of biotransofmrations.
The first output of plot_species_sunburst is an interactive figure displaying various taxonomic levels of the species.
[2]:
sources=['HMR_1951','D4OR','ACCOACORAT']
sequence_ids, F = plot_species_sunburst(sources=sources)
F
INFO: Loading microbes data...
INFO: Loading microbes reactions...
The second output is a dataframe containing the Entrez identifiers of each organism’s enzymes.
[3]:
sequence_ids.head(10)
[3]:
| HMR_1951 | D4OR | ACCOACORAT | |
|---|---|---|---|
| Actinobacillus_pleuropneumoniae_S8 | NaN | NaN | WP_005605423.1 |
| Actinobacillus_pleuropneumoniae_serovar_10_str_D13039 | NaN | NaN | WP_005605423.1 |
| Actinobacillus_pleuropneumoniae_serovar_11_str_56153 | NaN | NaN | WP_005605423.1 |
| Actinobacillus_pleuropneumoniae_serovar_12_str_1096 | NaN | NaN | WP_005605423.1 |
| Actinobacillus_pleuropneumoniae_serovar_13_str_N273 | NaN | NaN | WP_005605423.1 |
| Actinobacillus_pleuropneumoniae_serovar_1_str_4074 | NaN | NaN | WP_005598772.1 |
| Actinobacillus_pleuropneumoniae_serovar_2_str_4226 | NaN | NaN | WP_005602179.1 |
| Actinobacillus_pleuropneumoniae_serovar_2_str_S1536 | NaN | NaN | WP_005602179.1 |
| Actinobacillus_pleuropneumoniae_serovar_3_str_JL03 | NaN | NaN | WP_005602179.1 |
| Actinobacillus_pleuropneumoniae_serovar_4_str_M62 | NaN | NaN | WP_005605423.1 |
We advice sampling protein sequences because the number of sequences and organims involved in a biotransformation can be large. This cell provides a basic technique of collecting 30 random identifiers.
[4]:
ACCOACORAT=sequence_ids.drop_duplicates(subset=['ACCOACORAT'])[['ACCOACORAT']]
sample=ACCOACORAT.sample(n=30)
sample
[4]:
| ACCOACORAT | |
|---|---|
| Brevibacillus_brevis_FJAT_0809_GLX | WP_016742856.1 |
| Helicobacter_pylori_UM299 | WP_015645908.1 |
| Clostridium_botulinum_Ba4_str_657 | WP_003360499.1 |
| Helicobacter_pylori_Hp_H_43 | WP_000886341.1 |
| Helicobacter_pylori_CPY3281 | WP_000886313.1 |
| Staphylococcus_warneri_L37603 | WP_002451204.1 |
| Helicobacter_pylori_HLJHP271 | WP_017283306.1 |
| Listeria_monocytogenes_08_5578 | WP_012951129.1 |
| Escherichia_coli_B185 | WP_001277565.1 |
| Streptococcus_agalactiae_2603V_R | NP_687240.1 |
| Mycobacterium_avium_104 | WP_011724562.1 |
| Streptococcus_parasanguinis_SK236 | WP_003013745.1 |
| Morganella_morganii_SC01 | WP_004240115.1 |
| Citrobacter_freundii_str_ballerup_7851_39 | WP_016154965.1 |
| Streptococcus_parauberis_KCTC_11980BP | WP_003104753.1 |
| Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953 | WP_005898109.1 |
| Enterobacter_cloacae_subsp_dissolvens_SDM | WP_014830172.1 |
| Campylobacter_jejuni_subsp_jejuni_1854 | WP_002928246.1 |
| Helicobacter_pylori_Hp_A_14 | WP_000886353.1 |
| Helicobacter_pylori_UM037 | WP_015643831.1 |
| Streptococcus_mutans_11A1 | WP_002274416.1 |
| Helicobacter_pylori_8A3 | WP_000886326.1 |
| Bacteroides_salyersiae_CL02T12C01 | WP_007478489.1 |
| Escherichia_coli_DEC7A | WP_001277567.1 |
| Streptococcus_vestibularis_ATCC_49124 | WP_003092303.1 |
| Actinobacillus_pleuropneumoniae_serovar_1_str_4074 | WP_005598772.1 |
| Staphylococcus_lugdunensis_ACS_027_V_Sch2 | WP_002459950.1 |
| Gemella_haemolysans_M341 | WP_003147389.1 |
| Campylobacter_coli_202_04 | WP_002823200.1 |
| Vibrio_parahaemolyticus_AN_5034 | WP_005456939.1 |
Retriving sequences from online databases
The fetch_batch_sequences function allows you to map and retrieve protein sequences using the Entrez request system; MicrobeRX uses this approach with BioPython to collect the sequences and make them available for manipulation.
[5]:
ACCOACORAT_sequences=fetch_batch_sequences(entries=sample.ACCOACORAT,sequence_ids=sample.index,email='my_mail@mail.com')
ACCOACORAT_sequences
[5]:
[SeqRecord(seq=Seq('MLAQMRDDIHAVFERDPAARSTLEVVMTYSGLHAIWGHRIAHRLWKAELCTLAR...SVD'), id='Brevibacillus_brevis_FJAT_0809_GLX', name='WP_016742856.1', description='WP_016742856.1 MULTISPECIES: serine O-acetyltransferase [Bacillales]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_UM299', name='WP_015645908.1', description='WP_015645908.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MKNPFKVLIYDLKNAKEKDPAARNILEVFILYPFIHALIGYRIAHLFYKAHLFF...MII'), id='Clostridium_botulinum_Ba4_str_657', name='WP_003360499.1', description='WP_003360499.1 serine O-acetyltransferase EpsC [Clostridium botulinum]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_Hp_H_43', name='WP_000886341.1', description='WP_000886341.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKKGFYFIARA...KDR'), id='Helicobacter_pylori_CPY3281', name='WP_000886313.1', description='WP_000886313.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MLRRMRDDIKMVFEQDPAARSTIEVVTTYAGLHAVWSHLIAHKLYNNQRYVAAR...YII'), id='Staphylococcus_warneri_L37603', name='WP_002451204.1', description='WP_002451204.1 MULTISPECIES: serine O-acetyltransferase [Bacteria]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKKGFYFIARA...KDR'), id='Helicobacter_pylori_HLJHP271', name='WP_017283306.1', description='WP_017283306.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MPTRLKEDIATIIKNDPATKSFFDAFLTNPGLHALWWHRMANFFYRHKMVLFGK...EKE'), id='Listeria_monocytogenes_08_5578', name='WP_012951129.1', description='WP_012951129.1 serine O-acetyltransferase EpsC [Listeria monocytogenes]', dbxrefs=[]),
SeqRecord(seq=Seq('MSCEELEIVWNNIKAEARTLADCEPMLASFYHATLLKHENLGSALSYMLANKLS...DGI'), id='Escherichia_coli_B185', name='WP_001277565.1', description='WP_001277565.1 MULTISPECIES: serine O-acetyltransferase [Enterobacteriaceae]', dbxrefs=[]),
SeqRecord(seq=Seq('MGWWKESIAIVKEQDPAARSSLEVILTYPGIKALAAHRLSHFLWNHNFKLLARM...SKL'), id='Streptococcus_agalactiae_2603V_R', name='NP_687240.1', description='NP_687240.1 serine O-acetyltransferase [Streptococcus agalactiae 2603V/R]', dbxrefs=[]),
SeqRecord(seq=Seq('MLAAIRRDIRAARERDPAAPTTLQVIFAYPGVHAIWGHRVNHWLWRRGARLAAR...FSI'), id='Mycobacterium_avium_104', name='WP_011724562.1', description='WP_011724562.1 MULTISPECIES: serine O-acetyltransferase [Mycobacterium avium complex (MAC)]', dbxrefs=[]),
SeqRecord(seq=Seq('MGWWRETIDIVKENDPAARTSLEVLLTYPGVKALAAHRVSHFLWNHGFKLLARM...SSL'), id='Streptococcus_parasanguinis_SK236', name='WP_003013745.1', description='WP_003013745.1 MULTISPECIES: serine O-acetyltransferase [Streptococcus]', dbxrefs=[]),
SeqRecord(seq=Seq('MSYEELEEVWSYIKQEARDFADCEPMLASFFHATLLKHENLGSALSFMLANKLA...NDI'), id='Morganella_morganii_SC01', name='WP_004240115.1', description='WP_004240115.1 serine O-acetyltransferase [Morganella morganii]', dbxrefs=[]),
SeqRecord(seq=Seq('MPCEELDIVWKNIKAEARALAECEPMLASFYHATLLKHENLGSALSYMLANKLA...DGI'), id='Citrobacter_freundii_str_ballerup_7851_39', name='WP_016154965.1', description='WP_016154965.1 MULTISPECIES: serine O-acetyltransferase [Citrobacter]', dbxrefs=[]),
SeqRecord(seq=Seq('MGWWKKSIDIIKEKDPAARSSLEIILTYPGLKALAAHQLSHWMWQKNLKLLARM...SKL'), id='Streptococcus_parauberis_KCTC_11980BP', name='WP_003104753.1', description='WP_003104753.1 serine O-acetyltransferase [Streptococcus parauberis]', dbxrefs=[]),
SeqRecord(seq=Seq('MNIFSWFKDEFLNIQKKDPAIKSKLEIILYPSLHAIIYHKLAHFLYKCKLFFLA...VKN'), id='Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953', name='WP_005898109.1', description='WP_005898109.1 MULTISPECIES: serine O-acetyltransferase EpsC [Fusobacterium]', dbxrefs=[]),
SeqRecord(seq=Seq('MPCEELDIVWNNIKAEARALADCEPMLASFYHATLLKHENLGSALSYMLANKLA...DGI'), id='Enterobacter_cloacae_subsp_dissolvens_SDM', name='WP_014830172.1', description='WP_014830172.1 MULTISPECIES: serine O-acetyltransferase [Enterobacter]', dbxrefs=[]),
SeqRecord(seq=Seq('MNFWGIIKEDFSQPKVQDPAFNSCIELFFNYPGVWAVVNYRFAHFFYIRNFKRI...SQK'), id='Campylobacter_jejuni_subsp_jejuni_1854', name='WP_002928246.1', description='WP_002928246.1 serine O-acetyltransferase [Campylobacter jejuni]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRRFYFIARA...KDR'), id='Helicobacter_pylori_Hp_A_14', name='WP_000886353.1', description='WP_000886353.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCYRLAHALHKRRFYFIARA...KDR'), id='Helicobacter_pylori_UM037', name='WP_015643831.1', description='WP_015643831.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MGWWKETIDIVKEKDPAARTALEVLLTYPGVKALAAHCLSHFLWTHHCKLLARM...SGL'), id='Streptococcus_mutans_11A1', name='WP_002274416.1', description='WP_002274416.1 serine O-acetyltransferase [Streptococcus mutans]', dbxrefs=[]),
SeqRecord(seq=Seq('MLDLSYSLERVLQEDPAARNKWEVLLLYPGIHALLCHRLAHALHKRGFYFIARA...KDR'), id='Helicobacter_pylori_8A3', name='WP_000886326.1', description='WP_000886326.1 serine O-acetyltransferase [Helicobacter pylori]', dbxrefs=[]),
SeqRecord(seq=Seq('MKDIAIYGFGGFGREIACVINAINQATPTWNFIGYFDDGHSVGEANKYGRVLGN...RIQ'), id='Bacteroides_salyersiae_CL02T12C01', name='WP_007478489.1', description='WP_007478489.1 acetyltransferase [Bacteroides salyersiae]', dbxrefs=[]),
SeqRecord(seq=Seq('MSCEELEIVWNNIKAEARTLADCEPMLASFYHATLLKHENLGSALSYMLANKLS...DGI'), id='Escherichia_coli_DEC7A', name='WP_001277567.1', description='WP_001277567.1 serine O-acetyltransferase [Escherichia coli]', dbxrefs=[]),
SeqRecord(seq=Seq('MGWWKESIDIVKKNDPAARTSLEVLLTYPGLKALAAHRISHFLWRHHCRLLARM...SRL'), id='Streptococcus_vestibularis_ATCC_49124', name='WP_003092303.1', description='WP_003092303.1 MULTISPECIES: serine O-acetyltransferase [Streptococcus]', dbxrefs=[]),
SeqRecord(seq=Seq('MNESELNQIWKNIREEAEELVDNEPMLASFFHATILKHSNLGGSLSYILANKLA...DGI'), id='Actinobacillus_pleuropneumoniae_serovar_1_str_4074', name='WP_005598772.1', description='WP_005598772.1 serine O-acetyltransferase [Actinobacillus pleuropneumoniae]', dbxrefs=[]),
SeqRecord(seq=Seq('MLRRMRDDVKMVFEQDPAARTTFEVITTYAGLHAVWSHLIAHKLYNKKRYVLAR...YII'), id='Staphylococcus_lugdunensis_ACS_027_V_Sch2', name='WP_002459950.1', description='WP_002459950.1 MULTISPECIES: serine O-acetyltransferase [Staphylococcus]', dbxrefs=[]),
SeqRecord(seq=Seq('MGYFENLNYNLNRVLKDDPAAESKLMIYLTYPHIKALNYHYFSHKLYKKGWHTM...EDK'), id='Gemella_haemolysans_M341', name='WP_003147389.1', description='WP_003147389.1 serine O-acetyltransferase [Gemella haemolysans]', dbxrefs=[]),
SeqRecord(seq=Seq('MGFFGIIKEDFSQPKVQDPAYNSCIELFFNYPGVWAVVNYRFAHFFYTKNFKRT...DIK'), id='Campylobacter_coli_202_04', name='WP_002823200.1', description='WP_002823200.1 serine O-acetyltransferase [Campylobacter coli]', dbxrefs=[]),
SeqRecord(seq=Seq('MKHCEKQKVWNKIVSEAREMSEQEPMLASFYHATIIKHESLCAALSYILANKLN...DGI'), id='Vibrio_parahaemolyticus_AN_5034', name='WP_005456939.1', description='WP_005456939.1 serine O-acetyltransferase [Vibrio parahaemolyticus]', dbxrefs=[])]
Sequences can be easily saved to different formats uisng BioPython
[6]:
from Bio import SeqIO
SeqIO.write(ACCOACORAT_sequences,handle='ACCOACORAT.faa',format="fasta")
[6]:
30
Multi Sequence Aligment (MSA)
A common analysis for all sequences is to compare their sequence similarity using ClustalW. The function run_multi_sequence_aligment performs the MSA, saving the aligment results, a pyhilogenetic tree, and displaying the similarity matrix that can be used for further analysis.
[7]:
similarity_matrix=run_multi_sequence_aligment('ACCOACORAT.faa',input_format='fasta',output_aligment_format='fasta')
similarity_matrix
MSA and Phylogenetic tree have been saven in ACCOACORAT.faa directory
[7]:
| Brevibacillus_brevis_FJAT_0809_GLX | Helicobacter_pylori_UM299 | Clostridium_botulinum_Ba4_str_657 | Helicobacter_pylori_Hp_H_43 | Helicobacter_pylori_CPY3281 | Staphylococcus_warneri_L37603 | Helicobacter_pylori_HLJHP271 | Listeria_monocytogenes_08_5578 | Escherichia_coli_B185 | Streptococcus_agalactiae_2603V_R | ... | Streptococcus_mutans_11A1 | Helicobacter_pylori_8A3 | Bacteroides_salyersiae_CL02T12C01 | Escherichia_coli_DEC7A | Streptococcus_vestibularis_ATCC_49124 | Actinobacillus_pleuropneumoniae_serovar_1_str_4074 | Staphylococcus_lugdunensis_ACS_027_V_Sch2 | Gemella_haemolysans_M341 | Campylobacter_coli_202_04 | Vibrio_parahaemolyticus_AN_5034 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Brevibacillus_brevis_FJAT_0809_GLX | 59.0 | 100.0 | 57.0 | 58.0 | 59.0 | 62.0 | 59.0 | 46.0 | 34.0 | 44.0 | ... | 41.0 | 59.0 | 8.0 | 34.0 | 41.0 | 33.0 | 61.0 | 49.0 | 44.0 | 33.0 |
| Helicobacter_pylori_UM299 | 100.0 | 59.0 | 63.0 | 97.0 | 98.0 | 55.0 | 98.0 | 53.0 | 42.0 | 50.0 | ... | 48.0 | 99.0 | 17.0 | 42.0 | 49.0 | 40.0 | 52.0 | 54.0 | 45.0 | 40.0 |
| Clostridium_botulinum_Ba4_str_657 | 63.0 | 57.0 | 100.0 | 63.0 | 63.0 | 51.0 | 63.0 | 47.0 | 33.0 | 49.0 | ... | 45.0 | 64.0 | 16.0 | 33.0 | 47.0 | 33.0 | 51.0 | 53.0 | 47.0 | 34.0 |
| Helicobacter_pylori_Hp_H_43 | 97.0 | 58.0 | 63.0 | 100.0 | 97.0 | 54.0 | 97.0 | 53.0 | 42.0 | 49.0 | ... | 46.0 | 97.0 | 18.0 | 42.0 | 47.0 | 40.0 | 52.0 | 53.0 | 45.0 | 40.0 |
| Helicobacter_pylori_CPY3281 | 98.0 | 59.0 | 63.0 | 97.0 | 100.0 | 54.0 | 98.0 | 52.0 | 41.0 | 49.0 | ... | 47.0 | 97.0 | 17.0 | 41.0 | 48.0 | 40.0 | 52.0 | 55.0 | 46.0 | 39.0 |
| Staphylococcus_warneri_L37603 | 55.0 | 62.0 | 51.0 | 54.0 | 54.0 | 100.0 | 55.0 | 45.0 | 33.0 | 43.0 | ... | 39.0 | 55.0 | 9.0 | 33.0 | 38.0 | 33.0 | 91.0 | 48.0 | 43.0 | 31.0 |
| Helicobacter_pylori_HLJHP271 | 98.0 | 59.0 | 63.0 | 97.0 | 98.0 | 55.0 | 100.0 | 52.0 | 42.0 | 50.0 | ... | 47.0 | 97.0 | 22.0 | 42.0 | 49.0 | 40.0 | 53.0 | 54.0 | 46.0 | 40.0 |
| Listeria_monocytogenes_08_5578 | 53.0 | 46.0 | 47.0 | 53.0 | 52.0 | 45.0 | 52.0 | 100.0 | 34.0 | 47.0 | ... | 46.0 | 53.0 | 16.0 | 34.0 | 48.0 | 31.0 | 45.0 | 44.0 | 42.0 | 35.0 |
| Escherichia_coli_B185 | 42.0 | 34.0 | 33.0 | 42.0 | 41.0 | 33.0 | 42.0 | 34.0 | 100.0 | 36.0 | ... | 33.0 | 42.0 | 11.0 | 98.0 | 34.0 | 71.0 | 33.0 | 33.0 | 33.0 | 72.0 |
| Streptococcus_agalactiae_2603V_R | 50.0 | 44.0 | 49.0 | 49.0 | 49.0 | 43.0 | 50.0 | 47.0 | 36.0 | 100.0 | ... | 78.0 | 50.0 | 9.0 | 36.0 | 81.0 | 33.0 | 43.0 | 44.0 | 41.0 | 35.0 |
| Mycobacterium_avium_104 | 52.0 | 48.0 | 46.0 | 52.0 | 50.0 | 45.0 | 51.0 | 41.0 | 32.0 | 39.0 | ... | 39.0 | 52.0 | 11.0 | 32.0 | 42.0 | 34.0 | 45.0 | 47.0 | 38.0 | 31.0 |
| Streptococcus_parasanguinis_SK236 | 50.0 | 42.0 | 46.0 | 49.0 | 49.0 | 39.0 | 50.0 | 49.0 | 35.0 | 76.0 | ... | 83.0 | 50.0 | 11.0 | 35.0 | 81.0 | 33.0 | 40.0 | 44.0 | 37.0 | 33.0 |
| Morganella_morganii_SC01 | 40.0 | 33.0 | 37.0 | 40.0 | 40.0 | 34.0 | 41.0 | 36.0 | 82.0 | 38.0 | ... | 35.0 | 40.0 | 13.0 | 82.0 | 36.0 | 68.0 | 35.0 | 35.0 | 33.0 | 70.0 |
| Citrobacter_freundii_str_ballerup_7851_39 | 42.0 | 34.0 | 33.0 | 42.0 | 41.0 | 32.0 | 42.0 | 34.0 | 95.0 | 36.0 | ... | 33.0 | 42.0 | 11.0 | 95.0 | 33.0 | 72.0 | 33.0 | 33.0 | 33.0 | 72.0 |
| Streptococcus_parauberis_KCTC_11980BP | 49.0 | 44.0 | 49.0 | 47.0 | 49.0 | 42.0 | 49.0 | 47.0 | 34.0 | 84.0 | ... | 75.0 | 49.0 | 11.0 | 34.0 | 77.0 | 32.0 | 42.0 | 45.0 | 39.0 | 34.0 |
| Fusobacterium_nucleatum_subsp_polymorphum_ATCC_10953 | 53.0 | 48.0 | 57.0 | 52.0 | 52.0 | 46.0 | 52.0 | 48.0 | 34.0 | 47.0 | ... | 43.0 | 53.0 | 18.0 | 34.0 | 44.0 | 38.0 | 47.0 | 50.0 | 48.0 | 30.0 |
| Enterobacter_cloacae_subsp_dissolvens_SDM | 43.0 | 35.0 | 34.0 | 43.0 | 42.0 | 33.0 | 43.0 | 34.0 | 96.0 | 36.0 | ... | 33.0 | 43.0 | 11.0 | 95.0 | 34.0 | 71.0 | 33.0 | 34.0 | 34.0 | 71.0 |
| Campylobacter_jejuni_subsp_jejuni_1854 | 46.0 | 38.0 | 46.0 | 46.0 | 46.0 | 39.0 | 46.0 | 42.0 | 31.0 | 42.0 | ... | 39.0 | 47.0 | 11.0 | 31.0 | 35.0 | 33.0 | 40.0 | 41.0 | 86.0 | 32.0 |
| Helicobacter_pylori_Hp_A_14 | 97.0 | 58.0 | 63.0 | 99.0 | 96.0 | 54.0 | 96.0 | 53.0 | 41.0 | 49.0 | ... | 46.0 | 96.0 | 19.0 | 41.0 | 47.0 | 40.0 | 52.0 | 53.0 | 45.0 | 39.0 |
| Helicobacter_pylori_UM037 | 97.0 | 59.0 | 64.0 | 98.0 | 97.0 | 54.0 | 97.0 | 52.0 | 42.0 | 49.0 | ... | 47.0 | 97.0 | 18.0 | 42.0 | 48.0 | 40.0 | 52.0 | 53.0 | 46.0 | 40.0 |
| Streptococcus_mutans_11A1 | 48.0 | 41.0 | 45.0 | 46.0 | 47.0 | 39.0 | 47.0 | 46.0 | 33.0 | 78.0 | ... | 100.0 | 48.0 | 11.0 | 33.0 | 82.0 | 32.0 | 40.0 | 43.0 | 40.0 | 32.0 |
| Helicobacter_pylori_8A3 | 99.0 | 59.0 | 64.0 | 97.0 | 97.0 | 55.0 | 97.0 | 53.0 | 42.0 | 50.0 | ... | 48.0 | 100.0 | 17.0 | 42.0 | 49.0 | 40.0 | 52.0 | 54.0 | 46.0 | 40.0 |
| Bacteroides_salyersiae_CL02T12C01 | 17.0 | 8.0 | 16.0 | 18.0 | 17.0 | 9.0 | 22.0 | 16.0 | 11.0 | 9.0 | ... | 11.0 | 17.0 | 100.0 | 11.0 | 11.0 | 13.0 | 12.0 | 18.0 | 12.0 | 14.0 |
| Escherichia_coli_DEC7A | 42.0 | 34.0 | 33.0 | 42.0 | 41.0 | 33.0 | 42.0 | 34.0 | 98.0 | 36.0 | ... | 33.0 | 42.0 | 11.0 | 100.0 | 34.0 | 71.0 | 33.0 | 33.0 | 33.0 | 72.0 |
| Streptococcus_vestibularis_ATCC_49124 | 49.0 | 41.0 | 47.0 | 47.0 | 48.0 | 38.0 | 49.0 | 48.0 | 34.0 | 81.0 | ... | 82.0 | 49.0 | 11.0 | 34.0 | 100.0 | 32.0 | 39.0 | 44.0 | 36.0 | 32.0 |
| Actinobacillus_pleuropneumoniae_serovar_1_str_4074 | 40.0 | 33.0 | 33.0 | 40.0 | 40.0 | 33.0 | 40.0 | 31.0 | 71.0 | 33.0 | ... | 32.0 | 40.0 | 13.0 | 71.0 | 32.0 | 100.0 | 33.0 | 34.0 | 35.0 | 63.0 |
| Staphylococcus_lugdunensis_ACS_027_V_Sch2 | 52.0 | 61.0 | 51.0 | 52.0 | 52.0 | 91.0 | 53.0 | 45.0 | 33.0 | 43.0 | ... | 40.0 | 52.0 | 12.0 | 33.0 | 39.0 | 33.0 | 100.0 | 48.0 | 43.0 | 32.0 |
| Gemella_haemolysans_M341 | 54.0 | 49.0 | 53.0 | 53.0 | 55.0 | 48.0 | 54.0 | 44.0 | 33.0 | 44.0 | ... | 43.0 | 54.0 | 18.0 | 33.0 | 44.0 | 34.0 | 48.0 | 100.0 | 43.0 | 34.0 |
| Campylobacter_coli_202_04 | 45.0 | 44.0 | 47.0 | 45.0 | 46.0 | 43.0 | 46.0 | 42.0 | 33.0 | 41.0 | ... | 40.0 | 46.0 | 12.0 | 33.0 | 36.0 | 35.0 | 43.0 | 43.0 | 100.0 | 34.0 |
| Vibrio_parahaemolyticus_AN_5034 | 40.0 | 33.0 | 34.0 | 40.0 | 39.0 | 31.0 | 40.0 | 35.0 | 72.0 | 35.0 | ... | 32.0 | 40.0 | 14.0 | 72.0 | 32.0 | 63.0 | 32.0 | 34.0 | 34.0 | 100.0 |
30 rows × 30 columns
The MSA may be visualized interactively using the plot_similarity_matrix function. We also introduced a method that allows users to visualize sequence homology at a defined threshold.
[8]:
plot_similarity_matrix(similarity_matrix)
Aligment chart and sequence logo
This interactive map can be used to visualize the difference between distinct sequences at single aminoacid resolution, as well as the grade of sequence conservation and the consensus sequence, allowing for a thorough characterization of different protein sequences.
[9]:
plot_aligment_chart('ACCOACORAT.fasta')
Protein families using InterPro
MicrobeRX includes the ability to perform automated family and domain annotation of protein sequences for user-selected proteins.
WARNING: For large numbers of protein sequences or whole genomes, it is better and faster to run the analysis locally using [InterProScan] (https://www.ebi.ac.uk/interpro/).
[10]:
selected_protein=ACCOACORAT_sequences[0]
selected_protein
[10]:
SeqRecord(seq=Seq('MLAQMRDDIHAVFERDPAARSTLEVVMTYSGLHAIWGHRIAHRLWKAELCTLAR...SVD'), id='Brevibacillus_brevis_FJAT_0809_GLX', name='WP_016742856.1', description='WP_016742856.1 MULTISPECIES: serine O-acetyltransferase [Bacillales]', dbxrefs=[])
[11]:
interpro=get_interpro(sequence_id=selected_protein.id,sequence=str(selected_protein.seq),email='my_mail@mail.com')
interpro
Job ID: iprscan5-R20231124-154123-0249-84797931-p1m
Job is queued, please wait...
Job is running, please wait...
Job is running, please wait...
Job is running, please wait...
[11]:
| accesion | token | sequence_length | analysis | signature_accession | signature_description | start_location | stop_location | score | status | date | interpro_accession | interpro_description | go_annotations | pathways | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | FunFam | G3DSA:1.10.3130.10:FF:000002 | Serine acetyltransferase | 1 | 66 | 6.4E-33 | T | 24-11-2023 | - | - | - | - |
| 1 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | Coils | Coil | Coil | 192 | 212 | - | T | 24-11-2023 | - | - | - | - |
| 2 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | Pfam | PF00132 | Bacterial transferase hexapeptide (six repeats) | 117 | 150 | 9.8E-6 | T | 24-11-2023 | IPR001451 | Hexapeptide repeat | - | MetaCyc:PWY-2229|MetaCyc:PWY-241|MetaCyc:PWY-2... |
| 3 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | PANTHER | PTHR42811 | SERINE ACETYLTRANSFERASE | 3 | 181 | 8.4E-74 | T | 24-11-2023 | - | - | GO:0005829(PANTHER)|GO:0009001(PANTHER) | - |
| 4 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | FunFam | G3DSA:2.160.10.10:FF:000007 | Serine acetyltransferase | 67 | 181 | 4.2E-53 | T | 24-11-2023 | - | - | - | - |
| 5 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | PIRSF | PIRSF000441 | CysE | 1 | 218 | 2.0E-84 | T | 24-11-2023 | IPR005881 | Serine O-acetyltransferase | GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... | MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4... |
| 6 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | CDD | cd03354 | LbH_SAT | 64 | 164 | 1.73593E-56 | T | 24-11-2023 | IPR045304 | Serine acetyltransferase, LbH domain | - | MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4... |
| 7 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | SUPERFAMILY | SSF51161 | Trimeric LpxA-like enzymes | 4 | 168 | 9.84E-55 | T | 24-11-2023 | IPR011004 | Trimeric LpxA-like superfamily | - | MetaCyc:PWY-2229|MetaCyc:PWY-241|MetaCyc:PWY-2... |
| 8 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | NCBIfam | NF041874 | serine O-acetyltransferase EpsC | 5 | 170 | 4.9E-71 | T | 24-11-2023 | - | - | - | - |
| 9 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | Gene3D | G3DSA:2.160.10.10 | Hexapeptide repeat proteins | 67 | 185 | 4.2E-37 | T | 24-11-2023 | - | - | - | - |
| 10 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | NCBIfam | TIGR01172 | serine O-acetyltransferase | 6 | 166 | 6.3E-73 | T | 24-11-2023 | IPR005881 | Serine O-acetyltransferase | GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... | MetaCyc:PWY-3602|MetaCyc:PWY-361|MetaCyc:PWY-4... |
| 11 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | Pfam | PF06426 | Serine acetyltransferase, N-terminal | 3 | 35 | 0.041 | T | 24-11-2023 | IPR010493 | Serine acetyltransferase, N-terminal | GO:0005737(InterPro)|GO:0006535(InterPro)|GO:0... | MetaCyc:PWY-6936|MetaCyc:PWY-7274|MetaCyc:PWY-... |
| 12 | Brevibacillus_brevis_FJAT_0809_GLX | 8c3e2ed087ff5d386d16ad5757c07175 | 221 | Gene3D | G3DSA:1.10.3130.10 | serine acetyltransferase, domain 1 | 1 | 66 | 4.2E-24 | T | 24-11-2023 | IPR042122 | Serine acetyltransferase, N-terminal domain su... | - | MetaCyc:PWY-6936|MetaCyc:PWY-7274|MetaCyc:PWY-... |
[12]:
f=plot_interpro_results(interpro_results=interpro,compact=True)
f