CS 279 - Week 8 Lecture 2 - 2022-10-11 TODAY WE WILL: * announcements/reminders * a bit more on grep * start discussion of UNIX/Linux BRE (basic regular expressions) * prep for next class ===== a bit more on grep ===== * DEFAULT behavior: * if don't provide the options -nclq and provide just ONE file, grep's output is JUST the matching lines * if you provide more than one file, grep's output is specified by the pathname of the file as it was specified on the command line * no surprise, grep has LOTS of options -- here are just a few: * -n precede each matching line by its file name and line number * -c only show a count of the matching lines (you'll get 0's for files with no matches!) * -l only show the names of the files containing matching strings (nice for building a file list for a for loop!) * -s suppress error messages for nonexistent or unreadable files * -q run quietly -- don't write ANYTHING to standard output (!!), but exit with exit status 0 if any any input lines are selected (so, testable after the fact) * -v select the lines that DON'T match! * -i ignore the case of letters in making comparisons ===== Basic Regular Expressions (BREs) (UNIX/Linux style) ===== * regular expression: defines a pattern of text to be matched * several UNIX/Linux utilities expect you to specify patterns as regular expressions (REs) * grep, sed, ed, several others * 2 basic categories: Basic Regular Expressions (BREs) * understood by "older" UNIX programs (such as ed, grep, sed) Extended Regular Expressions (EREs) * an extension of REs recognized by egrep (same as grep -E) ===== BREs ===== * first: in general, any non-special character in a BRE matches that character in the text grep oink *.txt # find lines with o then i then n then k within them * SOME special characters are special ANYWHERE they appear in a pattern . * [ \ * SOME special characters are ONLY special under particular conditions * ^ is special only if it appears at the beginning of a pattern * $ is special at the END of a pattern * the character that terminates a pattern is special throughout that pattern * you CAN escape special character's meanings -- and just match that character -- by escaping it with a backslash 'cheap at $9.98' # but the . is special! How match . specifically? 'cheap at $9\.98' # now can match . specifically (and the $ is only special if it is at the END of a pattern) * \ - escapes the special meaning of the character following it IF it is special what if the following character is not special? ... yikes, backslash behavior is UNDEFINED in that case, so please try to avoid that in BREs! * . - the dot matches any single non-null character (so, the BRE version of globbing's ?) IN CLASS, this BRE needed to be in double-quotes to work as expected (within grep, at least): grep "o\.n" animals.txt # to JUST match o, then dot, then n * ^ - this character at the BEGINNING of the OUTERMOST RE matches the BEGINNING of a line (anywhere else, ^ matches ^) ...that is, we want lines that START with some pattern * $ - this character at the END of an OUTERMOST RE matches the END of a line (anywhere else, $ matches $) ...that is, we want lines that START with some pattern * * - * has a slightly DIFFERENT MEANING in REs than in globbing! in REs, * goes with the character preceding it -- matches 0 or more instances of THAT character grep ab*c *.txt # matches lines with an a # followed by 0 or more bs # followed by a c * can also follow a set of characters in square brackets [moxie] - matches one of m or o or x or i or e [moxie]* - matches 0 or more m's o's x's i's or e's in ANY combinatn we tried: grep "^m[moxie]*$" moxie-play.t ...and a line: mmmmmmmoxmxxeeie ...DID match it NOTE: beware of a pattern that is JUST one character followed by * a* ...that will match ANY line!!!! (asks for a line with 0 or more a's) Do you really want 1 or more? aa* will match a line with 1 or more lowercase a's * [set] - a set of characters in square brackets matches any single character from that set ...my reference for this called this a BRACKET EXPRESSION * ranges are allowed [c1-c2] matches any one of of the set of characters in the range c1 to c2, inclusive * there are some special classes similar to those we saw in globbing, written as [[:desired_class:]] [[:lower:]] - matches one lowercase letter * NOTE: the matching mechanism for BREs in UNIX/Linux is clever enough to consider the whole line when testing for a match -- ^a.*b.c$ # line starts with a then 0 or more of anything then b then any one character then c at the end of the line axybbcc will match this; general rule: when a BRE can be matched in more that one way, the longest possible matching sequence will be used