Published: 17th August 2011
DOI: 10.4204/EPTCS.63
ISSN: 2075-2180

EPTCS 63

Proceedings 8th International Conference
Words 2011
Prague, Czech Republic, 12-16th September 2011

Edited by: Petr Ambrož, Štěpán Holub and Zuzana Masáková

Preface
Invited Presentation: Interactions between Digital Geometry and Combinatorics on Words Srečko Brlek	1
Invited Presentation: Infinite permutations vs. infinite words Anna E. Frid	13
Invited Presentation: Form and Function in Language: A Praguian View Eva Hajičová	20
Invited Presentation: Combinatorics on words in information security: Unavoidable regularities in the construction of multicollision attacks on iterated hash functions Juha Kortelainen	22
Invited Presentation: Pattern avoidance and HDOL words Pascal Ochem	30
Invited Presentation: Circular words and applications Benoît Rittaud and Laurent Vivier	31
Finite-Repetition threshold for infinite ternary words Golnaz Badkobeh and Maxime Crochemore	37
Uniformly balanced words with linear complexity and prescribed letter frequencies Valérie Berthé and Sébastien Labbé	44
Pattern 1^j0^i avoiding binary words Stefano Bilotta, Elisa Pergola and Renzo Pinzani	53
Pattern Avoidability with Involution Bastian Bischoff and Dirk Nowotka	65
Recurrent Partial Words Francine Blanchet-Sadri, Aleksandar Chakarov, Lucas Manuelli, Jarett Schwartz and Slater Stich	71
Monoids and Maximal Codes Fabio Burderi	83
Bounded Parikh Automata Michaël Cadilhac, Alain Finkel and Pierre McKenzie	93
From Regular to Strictly Locally Testable Languages Stefano Crespi Reghizzi and Pierluigi San Pietro	103
On the commutative equivalence of bounded context-free and regular languages Flavio D'Alessandro and Benedetto Intrigila	112
Substitutions over infinite alphabet generating (−β)-integers Daniel Dombek	115
Dynamical generalizations of the Lagrange spectrum Sébastien Ferenczi	122
A Classification of Trapezoidal Words Gabriele Fici	129
On Pansiot Words Avoiding 3-Repetitions Irina A. Gorbunova and Arseny M. Shur	138
A new proof for the decidability of D0L ultimate periodicity Vesa Halava, Tero Harju and Tomi Kärki	147
The complexity of tangent words Thierry Monteil	152
Unambiguous 1-Uniform Morphisms Hossein Nevisi and Daniel Reidenbach	158
Constructing Premaximal Binary Cube-free Words of Any Level Elena A. Petrova and Arseny M. Shur	168
Abelian returns in Sturmian words Svetlana Puzynina and Luca Q. Zamboni	179
Fife's Theorem for (7/3)-Powers Narad Rampersad, Jeffrey Shallit and Arseny Shur	189
Information theory: Sources, Dirichlet series, and realistic analyses of data structures Mathieu Roux and Brigitte Vallée	199
Systems of Word Equations and Polynomials: a New Approach Aleksi Saarela	215
Word posets, with applications to Coxeter groups Matthew J. Samuel	226
The Critical Exponent is Computable for Automatic Sequences Jeffrey Shallit	231
Optimizing Properties of Balanced Words Nikita Sidorov	240
On the Delone property of (−β)-integers Wolfgang Steiner	247
Permutation complexity of the fixed points of some uniform binary morphisms Alexander Valyuzhenich	257
Permutation Complexity Related to the Letter Doubling Map Steven Widmer	265

Preface

This volume contains proceedings of the Eighth International Conference WORDS 2011. It was held in Prague, Czech Republic, from 12th to 16th September 2011 as a joint undertaking of the Czech Technical University and the Charles University.

The conference is a biannual meeting devoted to the mathematical theory of words, i.e., finite or infinite sequences of symbols over a finite alphabet. Since the first meeting in 1997 in Rouen, France, WORDS became the main event in the field. Words arise as a natural object in many areas and therefore the conference is also open to applications, be it in computer science, biology, linguistics or physics, while the emphasis remains on combinatorial, algebraic and algorithmic points of view.

Program committee selected 27 contributions out of 40 submissions to be presented at the conference. Each paper was reviewed by two referees. The reader will find here revised versions of the accepted contributions as well as summaries of six invited lectures.

We thank authors for their interest in WORDS, program committee members and other reviewers (listed below) for their effort, and the EPTCS editors, in particular Wolfgang Thomas and the Editor-in-Chief Rob van Glabbeek, for accepting the publication of the proceedings and for their helpful approach.

The event was financially supported by the Faculty of Nuclear Sciences and Physical Engineering of the Czech Technical University, by the Faculty of Mathematics and Physics of the Charles University, by the Doppler Institute for Mathematical Physics and Applied Mathematics, grant LC06002, and by the Czech Science Foundation, grant GAČR 201/09/0584.

Program Committee:

Pierre Arnoux, Marseille
Julien Cassaigne, Marseille
James Currie, Winnipeg
Clelia de Felice, Salerno
Christiane Frougny, Paris
Štěpán Holub, Prague (co-chair)
Juhani Karhumäki, Turku
Jean Neraud, Rouen
Dirk Nowotka, Stuttgart
Edita Pelantová, Prague (co-chair)
Laurent Vuillon, Le Bourget-du-Lac

Organizing Committee:

Petr Ambrož
Lubomíra Balková
Daniel Dombek
Štěpán Holub
Karel Klouda
Milan Krbálek
Zuzana Masáková
Edita Pelantová
Štěpán Starosta

Additional referees:

Flavio Alessandro, Jean-Paul Allouche, Lubomíra Balková, Cyril Banderier, Thierry Boush, Arturo Carpi, Sebastien Ferenczi, Giuditta Franco, Anna Frid, Roberto Grossi, Vít Jelínek, Raphaël Jungers, Steffen Kopecki, Eric Laugerotte, Jürn Laun, Thierry Lecroq, Jean-Gabriel Luque, Pavel Martyugin, Mike Müller, David Ralston, Antonio Restivo, Taizo Sadahiro, Marinella Sciortino, Carla Selmi, Štěpán Starosta, Serghei Verlan, Rosalba Zizza

Form and Function in Language: A Praguian View

Eva Hajičová (ÚFAL, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic)

There are two attributes by which the Prague Linguistic School is generally characterized: 'structural' and 'functional'. While 'structural' is a common denominator of several linguistic trends that originated in the first decades of the 20th century following Ferdinard de Saussure's pioneering linguistic approach, the term 'functional' was used by de Saussure only quite occasionally and it is supposed to be a distinctive feature of the writings of the Prague scholars, who at the same time as they recognized the necessity to describe and explain the collection of language phenomena as a structured whole rather than as a mechanical agglomeration, they emphasized that this structured whole — language — should be understood as a functioning means of communication.

In our contribution, we would like to characterize the Prague functionalism by pointing out some basic features of this approach as they are reflected in the Prague School writings both from the pre-war period as well as in the analyses of those linguists belonging to what may be called "second generation" of the Prague School. We focus on three issues which we consider to be of a more general interest:

(i) The notion of function in limguistic studies: The difficulty of an adequate characterization of the use of the term function and its derivatives consists in the fact that the very term function is homonymous and its use varies also in different scientific domains (Nagel 1961 distinguishes six various meanings of the term function in modern science). The ambiguity of this term can be demonstrated also on the different understanding of function by the Prague School. It was mainly Roman Jakobson who treated the concept of function in linguistics in the general theoretical framework of finalism or teleology. In his understanding, the claim "A phenomenon x is a means for the realization of an end F" is equivalent to the claim "A phenomenon x has a function f", that is to say that to have a function f is equivalent to "to serve as a means for the end (purpose) F". The teleological approach was reflected by Jakobson in his principle of 'therapeutic changes' in the development of language: language system always tends to a certain balance and the distortion of this balance initiates changes — which removes this insufficiency — but evokes imbalance in some other parts of the system and the process of therapeutic changes continues ad infinitum.

(ii) The articulation of the form — function relation into levels: The need for a systematic and integrated description of the relation of functions and forms has led to conceive the core of language system as consisting of levels: the units of the lower level may be looked upon as representations of the units levels can be understood as representations (forms) of the units of the adjacent higher levels, which in turn are viewed as functions of the adjacent lower levels units. From the methodological standpoint, there are two possible ways how to proceed: either to adopt the speaker's point of view and to proceed from function to form; i.e., from common needs of communication, or to take an opposite point of view and to proceed from form to function. A far-reaching significance for the understanding of the relations between units of adjacent levels is the notion of asymmetrical dualism introduced by S. Karcevskij (1929). The main idea consists in the recognition that the sign and its meaning (or rather function; Karcevskij uses the French term signification) do not cover in all their points the same field: the same sign has several functions (e.g. the morpheme -a in Czech may have the function of Nominative singular of feminine nouns, as in \v{z}en-a, or Nominative plural of neuter nouns as in m\v{e}st-a, and so on), and the same function can be expressed by several signs (e.g. the function of Nominative singular of feminine nouns can be expressed by the morpheme -a, or -e, or zero). There is always a certain tension between signifiant and signifi\'e and the asymmetrical dualism of the structure of the sign makes it possible for language to develop.

(iii) The communicative role of language: Language is regarded as a functioning system, adapted to its communicative role, diversified in more or less different social and local varieties, open to possible "variation", to change; this is reflected in the system itself: the system has a stable (solid central) core and peripheral domains (irregularities, obsolete or recent marginal phenomena) need not be in complete accordance with the laws and tendencies governing the central core. One of the important contributions of Prague scholars is their systematic attention paid to the information structure of sentences, i.e., in brief, to the distinction between the topic of the sentence("what is the sentence 'about'") and its focus ("what the sentence says about the topic"). Praguian scholars were the first to emphasize that this distinction has a semantic relevance.

In the full version of the paper, the above mentioned aspects will be discussed both from the historical perspective and the present day tenets of modern Prague theoretical linguistics and they will be illustrated by rich empirical material.

Pattern avoidance and HDOL words

Pascal Ochem (Laboratoire de Recherche en Informatique, Université Paris-Sud)

A pattern is a finite word of variables. A word w avoids pattern P if for any substitution φ of the variables of P with non-empty words, φ(P) is not a factor of w. Let Σ_i denote the i-letter alphabet {0,1,…,i−1}. The avoidability index λ(P) of P is the smallest integer k such that there exists an infinite word w over Σ_k avoiding P.

An infinite word w is said to be HDOL if it is the image by a morphism h of the fixed point of a morphism m, that is, w=h(m^ω(0)). Cassaigne conjectures that for every avoidable pattern P, there exists an HDOL word over Σ_λ(P) avoiding P.

Lower bounds on λ(P) are quite easy to obtain by backtracking, and Cassaigne's conjecture is supported by the fact that most of the proofs of upper bounds consist in finding a suitable HDOL word over Σ_λ(P) and showing that it avoids P.

For a given pattern, there can exist many HDOL words. For example in the case of squares, we have that λ(AA)=3 and the fixed point m^ω(0) of any ternary square-free morphism m avoids squares. On the other hand, it is known that λ(ABWACXBAYCAZCB)=4 and essentially the only infinite word over Σ₄ avoiding ABWACXBAYCAZCB is the fixed point of 0→01, 1→21, 2→03, 3→23.

Thue proved that ternary square-free words avoiding 010 and 212 that are bi-prolongable are exactly the factors of the fixed point t of 0→012, 1→02, 2→1. More recently, we proved that the only infinite binary word avoiding AABBCABBA and the factors 0011 and 1100 is essentially the image of t by the morphism

0→	0010110111011101001,
1→	00101101101001,
2→	00010.

In this talk, we consider the possibility that for every avoidable pattern P, there exists a finite set S of forbidden patterns and factors, containg P, such that the words over Σ_λ(P) are essentially the factors of an HDOL word. This is a strong version of Cassaigne's conjecture. I will give many examples of such HDOL words characterized by forbidden patterns and factors, as well as related open problems. We will also discuss the factor complexity of words avoiding patterns.

On the commutative equivalence of bounded context-free and regular languages

Flavio D'Alessandro (Dipartimento di Matematica 'G. Castelnuovo', Università di Roma 'La Sapienza')
Benedetto Intrigila (Dipartimento di Matematica, Università di Roma 'Tor Vergata')

In this paper, we study an algebraic and combinatorial problem concerning bounded context-free languages. A strictly related notion is that of sparse language: a language L is termed sparse if its counting function, that is, the function f_L that maps every integer n≥0 into the number f_L(n) of words of L of length n, is polynomially upper bounded. Sparse languages play a meaningful role both in Computer Science and in Mathematics and have been widely investigated in the past [3, 4, 8, 10, 11, 12, 13, 15, 17]. The interest in this class of languages is due to the fact that, in the context-free case, it coincides with the one of bounded languages [11, 15]. A language L is termed bounded if there exist n words u₁, …, u_n such that L ⊆ u₁^*⋯ u_n^*.

The starting point of this investigation is the following result proved in [3]: for every sparse context-free language L₁, there exists a regular language L₂ such that f_L₁ = f_L₂. Therefore, the counting function of a sparse context-free language is always rational. This result is remarkable since, as it is well known [7], the counting function of a context-free language may be transcendental.

A conceptual limit of the above mentioned construction is that the regular language L₂ is on a different alphabet from the one of L₁. Therefore it is natural to ask whether this limitation can be removed. Actually an even more general question can be asked. For this purpose, some preliminary notions are needed. Let A = {a₁, …, a_t} be an alphabet of t letters and let ψ: A^*→ N^t be the corresponding Parikh morphism. Given two languages L₁ and L₂ over the alphabet A, we say that L₁ is commutatively equivalent to L₂ if there exists a bijection f: L₁→ L₂ from L₁ onto L₂ such that, for every u∈ L₁, ψ(u) = ψ(f(u)). This notion is very important in the theory of variable-length codes since it is involved in the celebrated Schützenberger conjecture about the commutative equivalence of a maximal finite code with a prefix one (see e.g. [1]).

Now the general problem above can be formulated as follows: given a bounded context-free language L₁, does there exist a regular language L₂ which is commutatively equivalent to L₁?

In the sequel, for short, we refer to it as CE (Commutative Equivalence) Problem. The CE Problem was conjectured to have an affirmative solution by Stefano Varricchio and told to the authors during their past collaboration. In this paper, we prove the conjecture. More precisely, we prove the following statement.

Theorem 1 Every bounded context-free language L₁ is commutatively equivalent to a regular language L₂. Moreover the language L₂ can be effectively constructed starting from an effective presentation of L₁.

Observe that the CE problem can be seen as a kind of counting problem in the class of bounded context-free languages. Despite the fact that such class has been widely investigated in the past, CE problem appears to require some new techniques. Indeed, bounded context-free languages can be inherently ambiguous; in addition, if u₁^*⋯ u_n^*, u_i∈ A⁺, is the set that contains the bounded context-free language, then, in general, u₁^*⋯ u_n^* is ambiguous as product of languages of A^*. Such ambiguities, which are of different nature, interfere making the study of the problem a non trivial task.

Now we would like to give a broader picture about the relationships between the main result of this paper and some classical theorems on bounded context-free languages. The first result that deserves to be mentioned is a well-known theorem by Parikh [16]. For this purpose, let us first introduce a notion. Given two languages L₁ and L₂ over the alphabet A, we say that L₁ is Parikh equivalent to L₂ if ψ(L₁) = ψ(L₂). The theorem by Parikh states that, given a context-free language L₁, its image ψ(L₁) under the Parikh map is a semi-linear set of N^t. As a straighforward consequence of Parikh theorem, one has that there exists a regular language L₂ which is Parikh equivalent to L₁.

It is worth noticing that the property of Parikh equivalence by no means implies the property of commutative equivalence. Indeed, let A = {a, b} and let L₁ = (ab)^* ∪ (ba)^* and L₂ = (ab)^*. One has that ψ(L₁) = ψ(L₂) = (1,1)^⊕ (the symbol ⊕ denotes the Kleene star operation in the monoid N²) so that L₁ is Parikh equivalent to L₂. On the other hand, one immediately checks that L₁ cannot be commutatively equivalent to L₂.

Another theorem that is central in this context has been proved by Ginsburg (see [8], Theorem 5.4.2). For this purpose, let us first introduce a notion. Let L⊆ u₁^*⋯ u_n^* be a bounded language where, for every i=1, …, n, u_i is a word over the alphabet A. Let φ: Nⁿ → u₁^*⋯ u_n^* be the map defined as: for every tuple (l₁, …, l_n) ∈ Nⁿ, φ(l₁, …, l_n) = u₁^l₁⋯u_n^l_n. Ginsburg proved that L is context-free if and only if φ⁻¹(L) is a finite union of linear sets, each having a stratified set of periods. Roughly speaking, a stratified set of periods corresponds to a system of well-formed parentheses. However, Ginsburg theorem is of no help to study counting problems and, in particular, our problem, because of the ambiguity of the representation of such languages. Indeed, let A = {a, b,c} be a three letter-alphabet and let the language L = {aⁱb^jc^k | i,j,k ∈ N, i=j or j=k } [2]. Since L is inherently ambiguous, by [8] Theorem 6.2.1, L cannot be represented unambiguously as a finite disjoint union of a stratified set of periods. In this context, another important recent result that gives a characterization of bounded context-free languages in terms of finite unions of Dyck loops has been proven in [12]. However, neither this latter result can be used to deal with our problem because of the ambiguity of the representation of such languages as a finite union of Dyck loops.

Due to the space limitation, we are not able to report our proof of Theorem 1. Therefore, we limit ourselves to give an idea of the main techniques involved in the proof. A basic tool that has been used to study our problem is the celebrated theorem by Eilenberg and Schützenberger that provides an algebraic characterization of semi-linear sets in terms of semi-simple sets ([6,14). The proof is essentially divided in two steps. The first deals with the case where the bounded context-free language is described, via the Ginsburg map φ, by a linear set. In this case, the proof makes use of a technique of combinatorics on words concerning variable-length codes and of a new technique (inspired to our work [5]) of geometrical nature for the decomposition of linear sets. The second step deals with the general case, that is, when the language is described, via the Ginsburg map φ, by a semi-linear set. In this step, together with the mathematical tools mentioned above, some arguments of elementary number theoretical nature are used to get our main theorem (see e.g. [9]).

References

[1] J. Berstel, D. Perrin, and C. Reutenauer (2009): Codes and Automata, Encyclopedia of Mathematics and its Applications No. 129, Cambridge University Press, Cambridge.

[2] N. Chomsky, and M.-P. Schützenberger (1963): The Algebraic Theory of Context-free Languages, in P. Braffort and D. Hirschberg (eds.), "Computer Programming and Formal Systems", pp. 118–161, North Holland Publishing Company, Amsterdam, doi:10.1016/S0049-237X(08)72023-8.

[3] F. D'Alessandro, B. Intrigila, and S. Varricchio (2006): On the structure of the counting function of context-free languages, Theoret. Comput. Sci. 356, 104–117, doi:10.1016/j.tcs.2006.01.037.

[4] F. D'Alessandro, B. Intrigila, and S. Varricchio (2009): The Parikh counting functions of sparse context-free languages are quasi-polynomials, Theoret. comput. sci., 410, 5158–5181, doi:10.1016/j.tcs.2009.09.006.

[5] F. D'Alessandro, B. Intrigila, and S. Varricchio (2009): On some counting problems for semi-linear sets, to appear in Theoret. Comput. Sci, Available at http://arxiv.org/abs/0907.3005.

[6] S. Eilenberg, and M.-P. Schützenberger (1969): Rational sets in commutative monoids, J. of Algebra, 13, 173–191, doi:10.1016/0021-8693(69)90070-2.

[7] P. Flajolet (1987): Analytic models and ambiguity of context-free languages, Theoret. Comput. Sci. 49, 283–309, doi:10.1016/0304-3975(87)90011-9.

[8] S. Ginsburg (1966): The mathematical theory of context-free languages, Mc Graw- Hill, New York.

[9] G. H. Hardy, and E. M. Wright (1979): An introduction to the theory of numbers, Oxford University Press.

[10] J. Honkala (1998): Decision problems concerning thinness and slenderness of formal languages, Acta Informatica 35, 625–636, doi:10.1007/s002360050134.

[11] O. Ibarra, and B. Ravikumar (1986): On sparseness, ambiguity and other decision problems for acceptors and transducers, LNCS, vol. 210, pp. 171–179, Springer-Verlag, Berlin.

[12] L. Ilie, G. Rozenberg, and A. Salomaa (2000): A characterization of poly-slender context-free languages, Theor. Inform. Appl. 34, 77–86, doi:10.1051/ita:2000100.

[13] R. Incitti (2001): The growth function of context-free languages, Theoret. Comput. Sci. 255, 601–605, doi:10.1016/S0304-3975(00)00152-3.

[14] R. Ito, Every semi-linear set is a finite union of disjoint linear sets (1969): Journal of Computer and System Sciences 3, 295–342.

[15] M. Latteux and G. Thierrin (1984): On bounded context-free languages, Elektron. Informationsverarb. Kybernet. 20, 3–8.

[16] R. J. Parikh (1966): On context-free languages, Journal of the Association for Computing Machinery, 13, 570–581, doi:10.1145/321356.321364.

[17] D. Raz (1997): Length considerations in context-free languages, Theoret. Comput. Sci. 183, 21–32, doi:10.1016/S0304-3975(96)00308-8.