CS 279 - Week 9 Lecture 1 - 2022-10-17 TODAY WE WILL * announcements/reminders * ASIDE: one of several ways to read all the lines in a file * ASIDE: another place to use REs: the =~ operator * continuing Linux/UNIX REs: * a few more BRE options * a few Extended RE (ERE) options * prep for next class * Should be working on Homework 5! * Current Reading: * LDP Bash Beginners' Guide - Chapter 4 - Regular Expressions * 2021 course text: Section II - Chapter 19 - Sections 19.1, 19.2 * has SOME of the BRE material ===== ASIDE: one of several ways to read all the lines in a given file ==== * Bash has an odd while loop version that can read all the lines from a file: while read desired_line_variable do ... $desired_line_variable ... done < desired_file_name * it also seems to strip leading and trailing blanks...? ===== ASIDE: another place to use REs: the =~ operator ===== * FUN FACT: you can use an RE in an if-statement's test by using the =~ operator! * NOTES: * this is a test expression that needs to be in [[ ]] (not single [ ] ) * syntax: [[ given_string =~ desired_re ]] * this will be true of given_string matches the given desired_re * do NOT put quotes around the desired_re here ===== a few more BRE options ===== subexpressions! ===== * answer the question: what if you want to find a pattern with a REPEATED bit within it? * you can enclose a portion of an RE between the markers \( \) * this construct -- the \( and pattern and \) is called a SUBEXPRESSION * LATER in the enclosing pattern you can MATCH this by writing a BACKREFERENCE \n where n is a digit between 1 and 9 * examples: g\([a-z]*\)\&\1 grep "g\([a-z]*\)\&\1" play.txt * matches: g& gnat&nat oooogoober&ooberahhhh * will NOT match: ga&b nat&nat ==== interval expressions ==== * these are good for when you want to match a definite number of things (not just 0, 1, or many...) ...works for 0-or-1 also... * You can follow a single character, or an RE denoting a single character, by one of the following forms, called an INTERVAL expression: \{m\} \{m,\} \{m, n\} * here, m and n must be NON-NEGATIVE integers LESS THAN 256 * If S is the set containing EITHER the single character OR the characters that match the RE, \{m\} - denotes EXACTLY m occurrences of characters belonging to S [0-9]\{2\} - matches a sequence of exactly 2 digits \{m,\} - denotes AT LEAST m occurrences of characters belonging to S [0-9]\{2,\} - matches a sequence of 2 OR MORE digits \{m,n\} - denotes BETWEEN m and n occurrences (inclusive) of characters belonging to S [0-9]\{2,4\} - matches a sequence of 2, 3, or 4 digits ===== EREs - EXTENDED regular expressions ===== * note: these are extensions on the BRE syntax -- they do not work everywhere that BREs work, so beware! * for example, to use them with grep, you can use egrep or grep -E * an ERE follows the rules for a BRE with the following ADDITIONS and CHANGES: * two REs separated by a | match an occurrence of EITHER of them (that is, this acts like or) * UNQUOTED parentheses ("plain" parentheses...?) are used for GROUPING subexpressions -- catfish catfight dogfish dogfight I can use the ERE: (cat|dog)(fish|fight) for example: egrep "(cat|dog)(fish|fight)" play.txt grep -E "(cat|dog)(fish|fight)" play.txt * e+ matches 1 or more occurrences of an ERE e where e must be either a parenthesized subexpression OR an ERE that always matches exactly one character [A-Z][0-9]+ -- matches strings with an uppercase character followed by 1 or more digits * e? - matches zero or one occurrences of the ERE E [abc]?[0-9] - matches zero or one of a or b or c followed by exactly one digit