I Simposio de Postgrado 2023. Ingeniería, ciencias e innovación

I SIMPOSIO 2023 A STUDY ON REPETITIVENESS MEASURES FOR STRINGS ABSTRACT Nowadays, high volumes of repetitive data are being generated every second. Traditional compressors based on Shannon’s entropy fail to exploit the repetitiveness of data as a source of compressibility. Because of this limitation, repetition-aware compressors like the Lempel-Ziv parsing, the run-length encoding of the Burrows-Wheeler transform, and grammar-based compression via heuristics like Re-Pair have been devised. From a theoretical point of view, the size of a compressed representation of a text defines a measure of compressibility. In the case of repetition-aware compressors, these values are, in fact, measures of repetitiveness. Recently, measures of repetitiveness without a compression scheme have been designed, depending only on the combinatorial properties of strings. Repetitiveness measures based on compressors are essential because they help us study their properties in a more simple, elegant, and abstract way. On the other hand, measures based on the string’s combinatorial properties help us understand what we must exploit when designing compressors or indexes to achieve some desired amount of space or functionality. The goal of this thesis is to deepen our understanding of repetitivenessmeasures.Wewant to establish asymptotic bounds between them. We also want to assess their sensitivity to simple edit operations such as insertion, deletion, or substitution, as well as more complex ones like reverse and morphism application. Finally, we plan to introduce and explore other measures that focus on more specific aspects of repetitiveness. Overall, this researchwill enhanceourknowledgeof repetitiveness measures and provide insights into their behavior under various scenarios. The findings will be useful in designing more efficient compressors and indexes that can handle repetitive texts. 1 Departamento de Ciencia de la Computación (DCC), Universidad de Chile. 2 Centre for Biotechnology and Bioengineering (CeBiB). *Email: crurbina@dcc.uchile.cl Cristian Urbina 1,2*

RkJQdWJsaXNoZXIy Mzc3MTg=