Funkcionalna specifičnost za proteinske familije

Šimić, Nino

prikaz prve stranice dokumenta Funkcionalna specifičnost za proteinske familije

Download
PDF 849.33 KB

master's thesis

Funkcionalna specifičnost za proteinske familije

2025. urn:nbn:hr:217:410442

Šimić, Nino

University of Zagreb
Faculty of Science
Department of Mathematics

Cite this document

APA 6th Edition

Šimić, N. (2025). Funkcionalna specifičnost za proteinske familije (Master's thesis). Zagreb: University of Zagreb, Faculty of Science. Retrieved from https://urn.nsk.hr/urn:nbn:hr:217:410442

MLA 8th Edition

Šimić, Nino. "Funkcionalna specifičnost za proteinske familije." Master's thesis, University of Zagreb, Faculty of Science, 2025. https://urn.nsk.hr/urn:nbn:hr:217:410442

Chicago 17th Edition

Šimić, Nino. "Funkcionalna specifičnost za proteinske familije." Master's thesis, University of Zagreb, Faculty of Science, 2025. https://urn.nsk.hr/urn:nbn:hr:217:410442

Harvard

Šimić, N. (2025). 'Funkcionalna specifičnost za proteinske familije', Master's thesis, University of Zagreb, Faculty of Science, accessed 19 March 2025, https://urn.nsk.hr/urn:nbn:hr:217:410442

Vancouver

Šimić N. Funkcionalna specifičnost za proteinske familije [Master's thesis]. Zagreb: University of Zagreb, Faculty of Science; 2025 [cited 2025 March 19] Available at: https://urn.nsk.hr/urn:nbn:hr:217:410442

IEEE

N. Šimić, "Funkcionalna specifičnost za proteinske familije", Master's thesis, University of Zagreb, Faculty of Science, Zagreb, 2025. Available at: https://urn.nsk.hr/urn:nbn:hr:217:410442

Cite this item: https://urn.nsk.hr/urn:nbn:hr:217:410442

Please login to the repository to save this object to your list.

Metadata

Title	Funkcionalna specifičnost za proteinske familije
Title (english)	Functional specificity for protein families
Author	Nino Šimić
Mentor	Pavle Goldstein (mentor)
Committee member	Pavle Goldstein (predsjednik povjerenstva)
Committee member	Hrvoje Šikić (član povjerenstva)
Committee member	Ivana Šain Glibić (član povjerenstva)
Committee member	Vanja Wagner (član povjerenstva)
Granter	University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country	2025-02-26, Croatia
Scientific / art field, discipline and subdiscipline	NATURAL SCIENCES Mathematics Probability Theory and Statistics
Abstract	U ovom diplomskom radu promatrana su poravnanja pet proteinskih familija od kojih je svaka podijeljena u dvije podfamilije. Za svaku familiju, cilj je bio pronaći one pozicije poravnanja koje su najvažnije u klasifikaciji proteina te familije u njene podfamilije, specijalizirane za različite funkcije. Proučavane proteinske familije uključuju: acil transferaze (tj. AT-domene), familiju malatnih i laktatnih dehidrogenaza (MDH/LDH), ciklaze, kinaze, te ketoreduktaze (tj. KR-domene). Statistička analiza provedena nad nizovima iz poravnanja moguća je jer se svakoj aminokiselini (i praznini) u poravnanju pridružio petdimenzionalni numerički vektor. Definirana je razdvajajuća (split) S-statistika koja sumira omjere intergrupne i intragrupne varijabilnosti po svakoj koordinati aminokiselinskog vektora. Podacima se dodao šum dobiven iz poznate prosječne distribucije svih aminokiselina. Po vrijednostima S-statistike rangirane su pozicije za svako od 5 poravnanja, dok je distribucija S-statistike procijenjena nekom F-distribucijom. U većini slučajeva F-distribuiranost S-statistike nismo mogli odbaciti KS testom, pa su izdvojene statistički značajne pozicije za svaku familiju, na razinama značajnosti od 1, 5 ili 10 %. Prikazani su i t-SNE grafovi koji vizualiziraju originalne proteine iz poravnanja, koristeći samo 10 najznačajnijih pozicija tog poravnanja. Iz tih ilustrativnih grafova moglo se uočiti da, za svaku familiju, pripadne podfamilije tvore međusobno odvojene klastere, uz jako malo ili nimalo pogrešnih klasifikacija proteina. Konačno, usporedilo se rangiranje pozicija s rangiranjima u nekim sličnim prošlim istraživanjima. Dobivene značajne pozicije u ovom radu potencijalno daju vrijednu informaciju za buduća eksperimentalna biološka istraživanja, posebno u vidu mogućih mutacija enzima baš na tim pozicijama s ciljem postizanja drugačije, preferabilnije funkcije enzima.
Abstract (english)	In this thesis, alignments of five protein families were studied, where each family is split into two subfamilies. The goal was to find, for each protein family, the most important alignment positions in terms of separation of certain family into its subfamilies, specialized for different functions. Protein families that were studied include: acyl tranferases (AT-domains), a family of malate and lactate dehydrogenases (MDH/LDH), cyclases, kinases, and ketoreductases (KR-domains). Statistical analysis implemented on sequences of the alignment is possible because each aminoacid (and gap) in the alignment was given a five-dimensional numeric vector. Split statistic (S-statistic) was defined, which sums up ratios of between group variability and within group variability per each coordinate of aminoacid's vector. The noise produced from known random distribution of all aminoacids was added to the data. According to the values of S-statistic, the positions were ranked, for each of the 5 alignments, while the distribution of S-statistic was estimated by some F-distribution. In the majority of cases, the F-distribution of S-statistic could not be rejected with the KS test, so statistically significant positions for each family were selected, at significance levels of 1, 5 or 10 %. Also shown are t-SNE graphs that visualize the original proteins from each alignment, solely using their aminoacid residues on the ten most important positions of that alignment. From those illustrative graphs it can be observed that for each family, corresponding subfamilies make up mutually separated clusters, with very few or zero protein misclassifications. Finally, the ranking of positions was compared with rankings in similar past research. The significant positions found in this thesis potentially provide valuable information for future experimental biological research, especially in the form of possible enzyme mutations at those exact positions, with the aim of achieving a different, more preferable enzyme function.
Keywords
Keywords (english)
Language	croatian
URN:NBN	urn:nbn:hr:217:410442
Study programme	Title: Mathematical Statistics Study programme type: university Study level: graduate Academic / professional title: sveučilišni magistar matematike (sveučilišni magistar matematike)
Type of resource	Text
File origin	Born digital
Access conditions	Open access
Terms of use
Created on	2025-02-10 19:16:05