Glushkov algorithm

The algorithm constructs a nondeterministic automaton accepting the set of words described by a rational expression. The number of states of the automaton is proportional to the size of the expression.

The main steps of the algorithm starting from a regular expression E are as follows:

  1. Transform the expression E into a new expression E’ where each symbol appears at most once
  2. Construct an automaton A’ accepting all the words described by E’
  3. Transform A’ into an automaton A accepting all the words described by E by replacing the sustituted symbols by the first ones.

Step 1 : unique symbol

The first step of the algorithm is to separate all the symbols in the expression. Each occurrence of the same symbol is numbered (or rename all the symbols). The expression E = (ab + b) * becomes, for example, E’ = (ab0 + b1)*.

Let’s take a longer example: E = (a + b)*(abb + ε). After renaming we have the following expression: (x1 + x2)*(x3x4x5 + ε).

We must now define the group First containing the symbols that can start a word. The group Last containing the symbols that can finish the word. And a group Next countaining the symbols that can be suffix of another symbol.

In your example:

  • First = {x1,x2,x3}
  • Last= {x1,x2,x5}
  • Next :

automate1

 

Step 2 : building the automaton

Let E’ be a regular expression where all the symbols are different. An automaton A’ is constructed as follows.

  • The set of states is Q = {i} ∪ {a: appears in E’}
  • The initial state is the state i
  • The final states are Last(E’) to which we add i if ε ∈ E’
  • The transitions are (i, a, a) if a ∈ First(E’) or (a, b, b) if (a, b) ∈ Next(E’)

For (x1 + x2)*(x3x4x5 + ε), state i is linked to states  x1,x2,x3.

The Next table are use for (a,b,b) expressions. For example for the first raw: (x1,x1,x1), (x1,x2,x2), (x1,x3,x3).

Then, we add terminal sates {x1,x2,x5}:

langage31

Then, the unique symbols are substituted to the real ones