Title Algoritmi za kompresiju podataka
Author Luka Valenta
Mentor Saša Singer (mentor)
Committee member Saša Singer (predsjednik povjerenstva)
Committee member Miljenko Marušić (član povjerenstva)
Committee member Goran Igaly (član povjerenstva)
Committee member Zvonimir Bujanović (član povjerenstva)
Granter University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country 2019-09-23, Croatia
Scientific / art field, discipline and subdiscipline NATURAL SCIENCES Mathematics
Abstract Kompresija podataka je proces koji smanjuje veličinu podataka, tako da uklanja suvišne dijelove. Svaki zadatak kompresije podataka sastoji se od dva dijela, algoritma za kompresiju podataka i algoritma za dekompresiju podataka. U ovom radu dane su Shannonove definicije entropije i vlastite informacije poruke. One predstavljaju ključne pojmove teorije informacija, u sklopu koje su razvijeni algoritmi za kompresiju podataka. Opisano je prefiksno kodiranje te Huffmanov koder, kao najvažniji prefiksni koder. Opisan je i aritmetički koder. Oba kodera su često korištena u mnogim algoritmima za kompresiju podataka te opisujemo njihovu ulogu u kodiranju duljine javljanja, kodiranju pomakom na početak i predviđanju djelomičnim podudaranjem, kao i pripadne algoritme. Posljednje dvije skupine algoritama za kompresiju koje razmatramo u ovom radu su Lempel-Ziv algoritmi i Burrows-Wheelerov algoritam. Lempel-Ziv algoritmi, tijekom kompresije, grade rječnik dosad viđenih nizova znakova i koriste te informacije kako bi kodirali zadani niz poruka. Burrows-Wheelerov algoritam transformira ulazne podatke Burrows-Wheelerovom transformacijom te na transformiranim podacima primjenjuje kodiranje pomakom na početak, a zatim Huffmanov ili aritmetički koder. Za kraj, uspoređujemo dva konkretna formata za spremanje slika, PNG i JPEG format. Usporedbom slika u oba formata na slikama iz Raise-1k skupa slika slikanih kamerom te računalno generiranim slikama iz Lego brick images skupa slika, nameće se zaključak kako je JPEG uistinu bolji format za spremanje slika slikanih kamerom, jer kao lossy algoritam može postići daleko bolje omjere kompresije, a razlika u kvaliteti slike u odnosu na lossless PNG format je mala. S druge strane, za računalno generirane slike, kao što su slike iz Lego brick images, PNG format je očito bolji, jer kod JPEG formata dolazi do jasno vidljivog pada u kvaliteti slika, a razlika u omjerima kompresije nije velika, kao kod slika slikanih kamerom.
Abstract (english) Data compression is a process that reduces the size of the data by removing redundant parts. Each data compression task consists of two parts, an encoding algorithm and a decoding algorithm. This thesis provides Shannon’s definitions of entropy and self-information of a message. They represent the key concepts of information theory within which data compression algorithms have been developed. Prefix coding is described and so is the Huffman Coder, as the most important prefix coder. An Arithmetic Coder is also described. Both encoders are often used in many data compression algorithms, and we describe their role in Run-Length Coding, Move-To-Front Coding, and Partial Match Prediction, as well as the algorithms themselves. The last two sets of compression algorithms we consider in this thesis are the LempelZiv algorithms and the Burrows-Wheeler algorithm. During compression, the Lempel-Ziv algorithms build a dictionary of character strings seen so far, and use this information to encode a given string of messages. The Burrows-Wheeler algorithm transforms input data by using the Burrows-Wheeler transform, and applies the Huffman or arithmetic encoder to the transformed data. In the end, we compare two specific image saving formats, PNG and JPEG. By comparing images in both formats on images from the Raise-1k dataset created with a digital camera, and computer-generated images from the Lego brick images dataset, we conclude that JPEG is indeed a better format for storing camera images, as it can achieve far better compression ratios, because it is a lossy algorithm, and the difference in image quality compared to the lossless PNG format is small. On the other hand, for computer-generated images, such as images from the Lego brick dataset, the PNG format is clearly better, because JPEG format results in a visibly noticeable drop in image quality, and the difference in compression ratios is not as large as for the camera images.
Keywords
kompresija podataka
teorija informacija
koderi
PNG
JPEG
Keywords (english)
data compression
information theory
coder
PNG
JPEG
Language croatian
URN:NBN urn:nbn:hr:217:739345
Study programme Title: Computer Science and Mathematics Study programme type: university Study level: graduate Academic / professional title: magistar/magistra računarstva i matematike (magistar/magistra računarstva i matematike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2020-06-26 09:21:51