Please send questions to
st10@humboldt.edu .
assumed order of precedence in regular expressions:
* then concatenation then +
* and you SHOULD use parentheses as needed to get
the regular expression you really want;
a few more regular expression examples...
(on the board)
* RE's are used for...
lexical analyzers -- do lexical analysis,
the first phase of compiling,
turning characters into tokens (lexical units
such as identifiers, operators, keywords,
numbers, etc.
* grep and other utilities (awk, etc.)
* RE's describe the regular languages
(finite state automata accept the regular languages)
CONTEXT-FREE GRAMMARS (CFG's)
* context-free grammars describe context-free languages (CFL's)
* CFL's are a superset of the regular languages
(every regular language is a CFL, but not vice versa)
* in particular (Sipser, p. 91)
"such grammars can describe certain features
that have a recursive structure which makes them
useful in a variety of applications"
* CFL's are ALSO useful in:
* studying human/natural language
* defining programming languages
* formalizing the notion of parsing
* simplifying translation of programming
languages
* other string-processing applications
so -- what IS a context-free grammar?
informally:
* a finite set of variables (also called
nonterminals or syntactic categories),
each of which represents a language
* the languages represented by the variables
are described recursively in terms of each
other and primitive symbols called terminals
* the rules relating variables are called
productions (sometimes also called
substitution rules)
* often (strictly speaking) one variable
is designated as the start variable
(usually, but not always,
written as the left-hand side (LHS) of
the topmost production
EXAMPLE of a CFG
S -> 0A1
A -> 1A0
A -> B
B -> 00
* the above is a set of 4 productions
* see how S, A, B are on the LHS of some rule?
...those are the variables, or nonterminals,
of this CFG
* see how 0, 1 are not on the LHS of some
rule? ...those are the terminals of this
CFG
* S is the start nonterminal
(it is also commonly accepted that two productions
with the same LHS can be combined with | for "or":
S -> 0A1
A -> 1A0 | B
B -> 00
You can write derivations based on a CFG.
Here's how:
start with the start variable
at EACH step in the derivation,
substitute for ONE variable
the right-hand-side (RHS) of one of
its production rules
S => 0A1
=> 0B1
=> 0001
...when you run out of variables to substite
for, you have generated a string in the language
generated by that CFG
S => 0A1
=> 01A01
=> 01B01
=> 010001
... yes, there IS an infinite number of strings
in the language generated by this CFG!
L(G) is the language generated by a CFG G
* more on CFG's, and onward to BNF, on Thursday!