## Languages, grammar and automaton:

- Types of grammars
- Regular languages and regular expressions
- Deterministic finite automaton
- Non-deterministic finite automaton
- Non-deterministic finite automaton with epsilon transition
- Pushdown automaton

## How to study Automaton:

- Thompson’s automaton
- Glushkov’s automaton
- Brüggemann-Klein optimization
- Optimization of Chang-Paige
- Antimirov’s automaton

## Methods on Automaton:

- Regular expression calculation by McNaughton and Yamada
- Regular expression calculation by Brzozowski and McCluskey
- Regular expression calculation by Conway
- Regular expression calculation by a system of equations
- NFA determination towards DFA
- e-NFA determination towards DFA
- Minimization of a deterministic automaton
- Union, intersection and complementary

## Parsing:

- LR
- LL
- SLR
- LALR

In language theory, the set of elementary entities is called the alphabet. A combination of elementary entities is called a word. A set of words is called a language and is described by a grammar. From a grammar, we can build an effective procedure (called automaton) to decide if a word is part of the language.

There are different classes of languages, corresponding to different classes of grammars and automata. Among the simplest to study, the class of regular languages corresponds to regular grammars and finite automata. This grammar class is typically used to describe home automation without cycles.

A more global class than regular languages is the class of out-of-context languages, corresponding to context-free grammars and psuhdown automata. This grammar class, more powerful than the class of regular grammars, is typically used for home automation with known cycles.

The theory of languages is useful to:

- describe an automatic behavior (robotics, home automation, building automation)
- understand all of a language and its uses (compiler, DNA, machine learning)
- minimize an automatic process (integrated software, integrated map for aerospace or automotive, etc.)

## Alphabet

An alphabet, noted A, is a non-empty finite set of symbols (letters, signs, numbers, etc.)

A word, defined on an alphabet A, is a finite sequence of symbols / elements of A.

Some properties on words:

- The length of a word u defined on an alphabet A, written |u| is the number of symbols (his cardinal) that make up u.
- |u|
_{j}counts the number of occurrences of the symbol j in the word u. - Le mot vide, noté ε, est défini sur tous les alphabets et est le mot de longueur 0.
- L’ensemble de tous les mots formés à partir de l’alphabet A (resp. de tous les mots non vides) est noté A
^{*}(resp. A^{+}). - La concaténation de deux mots u et v, notée, u.v ou uv, est le mot formé en faisant suivre les symboles de u par les symboles de v. Une puissance n d’un mot est ce mot concaténé n fois.
- si un ou des symboles est à la puissance + alors cela signifie qu’il existe un ou plus symbole de ce type dans le mot (ex. (ab)
^{+}=ab…) - si un ou des symboles est à la puissance * alors cela signifie qu’il existe zéro ou plus symbole de ce type dans le mot

Let two words u and v be defined on an alphabet A:

- u is a prefix of v if and only if there exists a (potentially empty) word w such that uw = v.
- u is a suffix of v if and only if there exists a (potentially empty) word w such that wu = v.
- u is a factor of v if and only if there are two words (potentially empty) w
_{1}and w_{2}such that w_{1}uw_{2}=v. - a non-empty word u is said primitive if the equation u=v
^{i}hasn’t solution for i>1. - two words x = uv and y = vu deduced from each other by exchange of prefix and suffix are said conjugated.

## Languages

A language, defined on an alphabet A, is a set of words defined on A. A language is a subset of A^{*}.

- The union contains all the words that are contained in L
_{1}or in L_{2}. - The intersection is the language that contains all the words contained in both L
_{1}and L_{2}. - The complement of a language is the language containing all the words that are not in that language.
- Difference between two languages is the language containing all the words of L
_{1}that are not in L_{2}.

In addition to the set operations on the languages, the product or concatenation of two languages and defined by the language containing all the words formed by the concatenation of a word of L_{1} following by a word of L_{2}. The product of languages is associative but not commutative.

The power of a language is a successive concatenation of this language L^{n}=L.L^{n-1}. Power 0 forms the compound language of the empty word.

_{1}= {00, 11} and L

_{2}= {0, 1, 01} be two languages on {0,1}.

_{1}.L

_{2}= {000, 001, 0001, 110, 111, 1101}.

## Grammar

A language can be described by a number of rules: the grammar indicates how to construct sentences belonging to the language (functioning in production); the grammar also makes it possible to decide whether a given sentence belongs to the language or not (recognition operation).

A grammar is a quadruplet G = (T, N, S, R) as:

- T is the terminal vocabulary, that is to say the alphabet on which the language is defined.
- N is the non-terminal vocabulary, that is, the set of symbols that do not appear in the generated words, but are used during the generation. A non-terminal symbol designates a syntactic category.
- R is a set of rules called rewriting or production of the form: u1 → u2, with u1 ∈ (N ∪ T)
^{+}and u2 ∈ (N ∪ )^{*}. Remember that + means at least one element, and * means 0 or more. The intuitive meaning of these rules is that the non-empty sequence of terminal or non-terminal symbols u1 can be replaced eventually with the eventual empty of terminal or non-terminal symbols u2. - S ∈ N is the starting symbol or axiom. It is from this non-terminal symbol that we will begin the generation of words by means of the rules of grammar.

When several grammar rules have the same left-hand shape, one

can factor these different rules by separating the straight parts by vertical lines. For example, the set of rules S → ab, S → aSb, S → c can be written S → ab | aSb | c.

The language defined, or generated, by a grammar is the set of words that can be obtained from the starting symbol by applying the rules of the grammar. More formally, we introduce the notions of derivation between forms, this consists in replacing a non-terminal symbol by a sequence of terminal and non-terminal symbols according to the grammar rules.

For more information on language-grammar operations, please refer to the links table at the top of this course.