Title Funkcionalna specifičnost za proteinske familije
Title (english) Functional specificity for protein families
Author Nino Šimić
Mentor Pavle Goldstein (mentor)
Committee member Pavle Goldstein (predsjednik povjerenstva)
Committee member Hrvoje Šikić (član povjerenstva)
Committee member Ivana Šain Glibić (član povjerenstva)
Committee member Vanja Wagner (član povjerenstva)
Granter University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country 2025-02-26, Croatia
Scientific / art field, discipline and subdiscipline NATURAL SCIENCES Mathematics Probability Theory and Statistics
Abstract U ovom diplomskom radu promatrana su poravnanja pet proteinskih familija od kojih je svaka podijeljena u dvije podfamilije. Za svaku familiju, cilj je bio pronaći one pozicije poravnanja koje su najvažnije u klasifikaciji proteina te familije u njene podfamilije, specijalizirane za različite funkcije. Proučavane proteinske familije uključuju: acil transferaze (tj. AT-domene), familiju malatnih i laktatnih dehidrogenaza (MDH/LDH), ciklaze, kinaze, te ketoreduktaze (tj. KR-domene). Statistička
... More analiza provedena nad nizovima iz poravnanja moguća je jer se svakoj aminokiselini (i praznini) u poravnanju pridružio petdimenzionalni numerički vektor. Definirana je razdvajajuća (split) S-statistika koja sumira omjere intergrupne i intragrupne varijabilnosti po svakoj koordinati aminokiselinskog vektora. Podacima se dodao šum dobiven iz poznate prosječne distribucije svih aminokiselina. Po vrijednostima S-statistike rangirane su pozicije za svako od 5 poravnanja, dok je distribucija S-statistike procijenjena nekom F-distribucijom. U većini slučajeva F-distribuiranost S-statistike nismo mogli odbaciti KS testom, pa su izdvojene statistički značajne pozicije za svaku familiju, na razinama značajnosti od 1, 5 ili 10 %. Prikazani su i t-SNE grafovi koji vizualiziraju originalne proteine iz poravnanja, koristeći samo 10 najznačajnijih pozicija tog poravnanja. Iz tih ilustrativnih grafova moglo se uočiti da, za svaku familiju, pripadne podfamilije tvore međusobno odvojene klastere, uz jako malo ili nimalo pogrešnih klasifikacija proteina. Konačno, usporedilo se rangiranje pozicija s rangiranjima u nekim sličnim prošlim istraživanjima. Dobivene značajne pozicije u ovom radu potencijalno daju vrijednu informaciju za buduća eksperimentalna biološka istraživanja, posebno u vidu mogućih mutacija enzima baš na tim pozicijama s ciljem postizanja drugačije, preferabilnije funkcije enzima. Less
Abstract (english) In this thesis, alignments of five protein families were studied, where each family is split into two subfamilies. The goal was to find, for each protein family, the most important alignment positions in terms of separation of certain family into its subfamilies, specialized for different functions. Protein families that were studied include: acyl tranferases (AT-domains), a family of malate and lactate dehydrogenases (MDH/LDH), cyclases, kinases, and ketoreductases (KR-domains).
... More Statistical analysis implemented on sequences of the alignment is possible because each aminoacid (and gap) in the alignment was given a five-dimensional numeric vector. Split statistic (S-statistic) was defined, which sums up ratios of between group variability and within group variability per each coordinate of aminoacid's vector. The noise produced from known random distribution of all aminoacids was added to the data. According to the values of S-statistic, the positions were ranked, for each of the 5 alignments, while the distribution of S-statistic was estimated by some F-distribution. In the majority of cases, the F-distribution of S-statistic could not be rejected with the KS test, so statistically significant positions for each family were selected, at significance levels of 1, 5 or 10 %. Also shown are t-SNE graphs that visualize the original proteins from each alignment, solely using their aminoacid residues on the ten most important positions of that alignment. From those illustrative graphs it can be observed that for each family, corresponding subfamilies make up mutually separated clusters, with very few or zero protein misclassifications. Finally, the ranking of positions was compared with rankings in similar past research. The significant positions found in this thesis potentially provide valuable information for future experimental biological research, especially in the form of possible enzyme mutations at those exact positions, with the aim of achieving a different, more preferable enzyme function. Less
Keywords
funkcionalna specifičnost
proteinsko poravnanje
intergrupna varijabilnost
intragrupna varijabilnost
F-distribucija
Keywords (english)
functional specificity
protein alignment
between group variability
within group variability
F-distribution
Language croatian
URN:NBN urn:nbn:hr:217:410442
Study programme Title: Mathematical Statistics Study programme type: university Study level: graduate Academic / professional title: sveučilišni magistar matematike (sveučilišni magistar matematike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2025-02-10 19:16:05