CS 279 - Week 9 Lecture 1 - 2022-10-17
TODAY WE WILL
* announcements/reminders
* ASIDE: one of several ways to read all the lines in a file
* ASIDE: another place to use REs: the =~ operator
* continuing Linux/UNIX REs:
* a few more BRE options
* a few Extended RE (ERE) options
* prep for next class
* Should be working on Homework 5!
* Current Reading:
* LDP Bash Beginners' Guide - Chapter 4 - Regular Expressions
* 2021 course text: Section II - Chapter 19 - Sections 19.1, 19.2
* has SOME of the BRE material
=====
ASIDE: one of several ways to read all the lines in a given file
====
* Bash has an odd while loop version that can
read all the lines from a file:
while read desired_line_variable
do
... $desired_line_variable ...
done < desired_file_name
* it also seems to strip leading and trailing blanks...?
=====
ASIDE: another place to use REs: the =~ operator
=====
* FUN FACT: you can use an RE in an if-statement's test
by using the =~ operator!
* NOTES:
* this is a test expression that needs to be in [[ ]]
(not single [ ] )
* syntax:
[[ given_string =~ desired_re ]]
* this will be true of given_string matches
the given desired_re
* do NOT put quotes around the desired_re here
=====
a few more BRE options
=====
subexpressions!
=====
* answer the question: what if you want to find
a pattern with a REPEATED bit within it?
* you can enclose a portion of an RE between the markers
\( \)
* this construct -- the \( and pattern and \)
is called a SUBEXPRESSION
* LATER in the enclosing pattern you can MATCH this
by writing a BACKREFERENCE \n where n is a digit
between 1 and 9
* examples:
g\([a-z]*\)\&\1
grep "g\([a-z]*\)\&\1" play.txt
* matches:
g&
gnat&nat
oooogoober&ooberahhhh
* will NOT match:
ga&b
nat&nat
====
interval expressions
====
* these are good for when you want to match a definite
number of things (not just 0, 1, or many...)
...works for 0-or-1 also...
* You can follow a single character,
or an RE denoting a single character,
by one of the following forms, called an INTERVAL expression:
\{m\} \{m,\} \{m, n\}
* here, m and n must be NON-NEGATIVE integers LESS THAN 256
* If S is the set containing EITHER the single character
OR the characters that match the RE,
\{m\} - denotes EXACTLY m occurrences of characters belonging
to S
[0-9]\{2\} - matches a sequence of exactly 2 digits
\{m,\} - denotes AT LEAST m occurrences of characters belonging
to S
[0-9]\{2,\} - matches a sequence of 2 OR MORE digits
\{m,n\} - denotes BETWEEN m and n occurrences (inclusive)
of characters belonging to S
[0-9]\{2,4\} - matches a sequence of 2, 3, or 4 digits
=====
EREs - EXTENDED regular expressions
=====
* note: these are extensions on the BRE syntax --
they do not work everywhere that BREs work,
so beware!
* for example, to use them with grep,
you can use egrep or grep -E
* an ERE follows the rules for a BRE with the following
ADDITIONS and CHANGES:
* two REs separated by a | match an occurrence of
EITHER of them (that is, this acts like or)
* UNQUOTED parentheses ("plain" parentheses...?)
are used for GROUPING subexpressions --
catfish catfight dogfish dogfight
I can use the ERE: (cat|dog)(fish|fight)
for example:
egrep "(cat|dog)(fish|fight)" play.txt
grep -E "(cat|dog)(fish|fight)" play.txt
* e+ matches 1 or more occurrences of an ERE e
where e must be either a parenthesized subexpression
OR an ERE that always matches exactly one character
[A-Z][0-9]+ -- matches strings with an uppercase
character followed by 1 or more digits
* e? - matches zero or one occurrences of the ERE E
[abc]?[0-9] - matches zero or one of a or b or c
followed by exactly one digit