Please send questions to st10@humboldt.edu .

assumed order of precedence in regular expressions:

*   then concatenation then +

*   and you SHOULD use parentheses as needed to get
    the regular expression you really want;

a few more regular expression examples...
(on the board)

*   RE's are used for...
    lexical analyzers -- do lexical analysis,
    the first phase of compiling,
    turning characters into tokens (lexical units
        such as identifiers, operators, keywords,
	        numbers, etc.

*   grep and other utilities (awk, etc.)

*   RE's describe the regular languages
    (finite state automata accept the regular languages)

CONTEXT-FREE GRAMMARS (CFG's)

*   context-free grammars describe context-free languages (CFL's)

*   CFL's are a superset of the regular languages
    (every regular language is a CFL, but not vice versa)

*   in particular (Sipser, p. 91)
    "such grammars can describe certain features
    that have a recursive structure which makes them
    useful in a variety of applications"

*   CFL's are ALSO useful in:
    *   studying human/natural language
    *   defining programming languages
    *   formalizing the notion of parsing
    *   simplifying translation of programming
        languages
    *   other string-processing applications

so -- what IS a context-free grammar?
   informally:
   *   a finite set of variables (also called
       nonterminals or syntactic categories),
       each of which represents a language

   *   the languages represented by the variables
       are described recursively in terms of each
       other and primitive symbols called terminals

   *   the rules relating variables are called
       productions (sometimes also called
       substitution rules)

   *   often (strictly speaking) one variable
       is designated as the start variable
       (usually, but not always,
       written as the left-hand side (LHS) of
       the topmost production

EXAMPLE of a CFG

S -> 0A1
A -> 1A0
A -> B
B -> 00

*   the above is a set of 4 productions 

*   see how S, A, B are on the LHS of some rule?
    ...those are the variables, or nonterminals,
    of this CFG

*   see how 0, 1 are not on the LHS of some
    rule? ...those are the terminals of this
    CFG

*   S is the start nonterminal

(it is also commonly accepted that two productions
with the same LHS can be combined with | for "or":

S -> 0A1
A -> 1A0 | B
B -> 00

You can write derivations based on a CFG.
Here's how:
start with the start variable

at EACH step in the derivation,
    substitute for ONE variable
    the right-hand-side (RHS) of one of
    its production rules

S => 0A1
  => 0B1
  => 0001

...when you run out of variables to substite
   for, you have generated a string in the language
   generated by that CFG

S => 0A1
  => 01A01
  => 01B01
  => 010001

... yes, there IS an infinite number of strings
    in the language generated by this CFG!
L(G) is the language generated by a CFG G

*   more on CFG's, and onward to BNF, on Thursday!