CS 279 - Week 8 Lecture 1 - 2022-10-10
TODAY WE WILL:
* announcements
* follow-up: a few more examples of UNIX/Linux signals
* more fun with patterns, part 1:
more file globbing options
* more fun with patterns, part 2:
start our discussion of UNIX/Linux' implementation
of Regular Expressions (REs)
* prep for next class
=====
follow-up: *some* EXAMPLES of Linux/UNIX signals:
=====
* signal: a message from the OS to a process;
one web site called it a software interrupt;
* example: when the user types ^Z, the Linux/UNIX operating system
sends a signal to the process trying to put it into
the background...
* in UNIX/Linux, each signal has a numerical code and a
name
num
code name
---- -------
* 1 SIGHUP terminal hangup
* 2 SIGINT terminal interrupt
* 3 SIGQUIT terminal quit (with a memory dump in a file core)
* 9 SIGKILL process killed
* notice that numerical code of 9!
* I think when you run the command:
kill -9 <processid-or-job-num>
...you are asking for SIGKILL
* 13 SIGPIPE broken pipe (writing when the reader has terminated)
* 14 SIGALRM alarm clock interrupt
* 15 SIGTERM software termination
* 23 SIGCONT continue job if stopped
* the fg command sends this signal, for example
* 24 SIGSTOP noninteractive stop signal
* stops the job -- suspends its activity but
does not terminate it; it can be resumed later
* 26 SIGTTIN read attempted by a background job
* 27 SIGTTOU write attempted by a background job
* by default, these signals stop a job that
is running in the bakground
* programs such as kill normally use SIGTERM to kill
another process --
* receiving process can catch it an choose to
continue;
* BUT a SIGKILL, in theory, cannot be caught
(and that's kill with a -9 option) --
in theory, that process will be killed
(although the kill may not aloways work...?)
=====
more fun with patterns, part 1:
more GLOBBING options!
=====
* fun facts:
* some shells call file globbing by different
names:
* pathname expansion
* filename expansion
* globbing is NOT built into the UNIX file mechanism;
it is recognized by various shells!
* the *shell* expands the file globbing pattern
into a space-separated list of matching filenames
BEFORE the command is done
* you already know:
* - matches any ZERO OR MORE charaters
Here are a few more:
? - matches any EXACTLY ONE character
[ ... ] - this matches an SINGLE character within the [ ]
if you prefer:
[cset] - this matches any single character in cset
* [moxie] - matches a single m or o or x or i or e
[0123456789] - matches a single digit
*.[ch]* - here, the file's suffix must start with c or h
(nice for matching C++ source code files...!)
* ranges are also supported in [ ]!
[0-9] same as [0123456789]
[a-z] same as [abcdefghijklmnopqrstuvwxyz]
...etc.
* these ranges ARE inclusive!
[d-f] DOES match d or e or f
* there are also a number of predefined sets (!!)
* within the usual [ ],
you put another [, then a :, then the set name,
then another :, then another ]
[[:digit:]] - same as [0-9] and [0123456789]
[[:alpha:]] - will match the set of [a-zA-z] PLUS
any other characters considered letters
in your locale
similarly, [[:upper:]] and [[:lower:]]
[[:space:]] - matches characters such as space, newline,
and more
[[:blank:]] - matches just space and tab
[[:cntrl:]] - matches control characters
* within [ ], can use an ! to indicate you want to
match something that ISN'T one of those
* starts with an uppercase letter,
ends with anything BUT a digit
[A-Z]*[![:digit:]]
* adding after class: ! does not HAVE to be before
a predefined set; these also work:
[A-Z]*[!0123456789]
[A-Z]*[!0-9]
* NOTE the following LIMITATIONS on these patterns:
* a / in the actual pathname MUST be matched
by an explicit / in the pattern (NOT a wildcard)
* a . in the actual pathname that comes at the
beginning or follows a / must similarly
be matched by an explicit . in the pattern
* playing around with this a bit:
gn*.l - gnu.l MATCHES
gneiss.l MATCHES
gn.l MATCHES (* matches 0 or more)
gnu nope!
gn/x.l nope! (have to explicitly include / to
match it)
~/.[[:alpha:]]* - ~/.login MATCHES
~/..login nope! (have to explicitly include
second . to match ..)
~/.mailrc MATCHES
~/login nope! (no . after ~/)
*/doit* - one/doit MATCHES
two/doit.c MATCHES (. must be explicitly
included at beginning
or after a /, other locations
CAN be matched with wildcards)
three/doit.cpp MATCHES
doit nope (no / )
[A-Z]*[![:digit:]] - Gz MATCHES
X nope ([ ] has to match ONE at least)
*.[acAC] - file.a MATCHES
.a no?! <-- oh, .a can only match
a pattern starting with
an explicit .
ls .[acAC]
...WOULD list .a in its output
file nope
stuff.ac no -- [acAC] matches exactly 1 character
in the set, stuff.ac has TWO
characters after its .
=====
more fun with patterns, part 2
UNIX/Linux regular expressions
BUT FIRST: more on the grep command
=====
* you'd like even more expressive power?
...that you can use in other contexts besides
representing lists of filenames?
* UNIX/Linux has it! Supports:
BRE - Basic Regular Expression
ERE - Extended Regular Expression
* we'll start with BREs in the context of the grep
command
grep - general regular expression parser
* has options! WITHOUT options,
grep pattern filelist
...returns the lines of files in the filelist
that include the pattern, with the filename, a colon, then
the entire line that includes the pattern
grep "oink" *.txt
...return all lines including oink within files
ending in .txt
* just a TASTE of some grep options:
* -l (before the pattern)
returns JUST the names of files with that pattern
grep -l "oink" *
* -E - lets grep accept EREs as its pattern
* -c - only show a count of matching lines