Lab 4: Text filtering editors
Copy the following files from /etc to your home root folder
$ cp /etc/fstab ~
$ cp /etc/passwd ~
Part 1: Grep
The grep command searches for lines matching a pattern and prints the matching lines to output.
- View all occurrences of “systemd” in the
passwd
file.$ grep "systemd" passwd
Sample output:systemd-network:x:100:102:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:101:103:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
systemd-timesync:x:102:104:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin
- Show line number of the matches.
$ grep -n "systemd" passwd
- Invert the match to show lines without “systemd”. This is done with the
-v
option.$ grep -v "systemd" passwd
It is also necessary in some cases to print the lines before or after a match.
- Print 5 lines after the match.
$ grep -A 5 "systemd" passwd
- Print 3 lines before the match.
$ grep -B 3 "systemd" passwd
- Use the
-C
option to print 5 lines before and after a match.$ grep -C 5 "systemd" passwd
- Specify the
-P
option to use PCRE (Perl Compatible Regular Expression).$ grep -P "(systemd|root)" passwd
- Class activity: Save the following lines to a file
regextest.txt
and try to match all the fields.03/22 08:51:06 INFO :...read_physical_netif: index #0, interface VLINK1 has address 129.1.1.1, ifidx 0
03/22 08:51:06 ERROR :...read_physical_netif: index #4, interface CTCD0 has address 9.67.116.98, ifidx 4
Regex cheat sheet: https://quickref.me/grep
Part 2: AWK
AWK is a language designed for text processing and typically used as a data extraction and reporting tool. It can be used like sed and grep to filter data with additional capabilities. It is a standard feature of most Unix-like operating systems.
- awk can be used like grep. The syntax is shown
$ awk '/systemd/{print $0}' passwd
- We can use the
gsub
method to substiture all occurrences of systemd$ awk '{gsub(/systemd/, "NEWSYSTEMD")}{print}' passwd
- Add header and footer to the text document
$ awk 'BEGIN {print "PASSWD FILE\n--------------"} {print} END {print "--------------\nEND OF PASSWD FILE"}' passwd
- We can specify delimiters to separate fields in a string. In the example below, we use
:
as the delimiter$ awk -F ":" '{print $1, $6, $7}' passwd
- Numeric comparison is possible with awk.
$ awk -F ":" '{ if ($3 > 100) {print $0} }' passwd
Part 3: SED
The sed command (short for stream editor) performs editing operation on text coming from standard input or file. The sed command can be used like grep but it has more functionalities.
-
Sed by default will output the entire content of the file even when there is a match.
$ sed '/systemd/p' passwd
The pattern we are searching for is enclosed in the /.../
. In this case, we are searching for “systemd”.
The enclosed pattern is followed by a p
command so that sed will print the line to standard output.
-
Now, let’s use sed like grep. To print only the lines that match, we add the -n
option.
$ sed -n '/systemd/p' passwd
-
Sed can substitute a matched pattern with another string before an output is displayed. It follows the structure s/pattern/replacement/
.
In the output, replace “systemd” with “NEWSYSTEMD”
$ sed -n 's/systemd/NEWSYSTEMD/p' passwd
-
In case you want to output all contents of the file to another file while replacing “systemd”, you can remove the -n
option. Analyse the output from:
$ sed 's/systemd/NEWSYSTEMD/' passwd
-
We can restrict sed to perform it’s operation on a specific line number. In the example below, we restrict sed to line 1.
$ sed '1 s/root/NOTROOT/' passwd
-
We can specify a range of line numbers.
$ sed '2,4 s/bin/NOBIN/g' passwd
g
stands for global, which means that all matching occurrences in the line would be replaced. By default, sed will replace only the first occurrence in the line.
-
We can also specify the line number where the match should start from. Sed will terminate when the first match is found.
$ sed -n '5,/systemd/p' passwd
-
Search for the word “sda” and replace it with “hda” globally (s/regexp/replacement/g), when the line contains the key “efi” (/regexp/) in the file fstab
$ sed '/efi/ s/sda/hda/g' fstab
-
Search for the word “:” and replace it with “;” when the line contains the key “root” in file passwd
$ sed '/root/ s/:/;/g' passwd
-
Create the following file called unique with the following content
$ vi unique.txt
This line occurs only once.
This line occurs twice.
This line occurs twice.
This line occurs three times.
This line occurs three times.
This line occurs three times.
-
Delete line 2 and 3 from the file unique
$ sed '2,3 d' unique.txt
-
Delete all line that starts with “This”
$ sed '/^This/ d' unique.txt
Questions to answer
Save the following lines to a file server-data.log
.
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.410.15.0/24'
2022/09/18 13:25:34 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:35 wazuh-remoted: WARNING: Remote syslog not parsed from: '10.110.18.0/24'
2022/09/18 13:25:35 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
Log1 2022/09/18 13:25:35 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:35 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24' END
2022/09/18 13:25:35 wazuh-remoted: ACTION: none INFO: Remote syslog allowed from: '10.110.15.0/24'
The following tasks are to be completed with either grep, sed, or awk.
All actions are to be performed on server-data.log
- View only error and warning messages in
server-data.log
. Show how you can do this with grep and awk.
- View every line except lines with informational messages.
- Count how many error messages are in the log.
- Hide the IP addresses. Replace all IP addresses with
xxx.xxx.xxx.xxx/xx
and save the output to a file newlog.log
. Show the output.
This simulates a scenario where you want to send your logs to a third-party and you need to hide some information in the log messages.
- Write a single regular expression to match the following lines in
server-data.log
. Show the full command and regex used.2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:34 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:35 wazuh-remoted: WARNING: Remote syslog not parsed from: '10.110.18.0/24'
2022/09/18 13:25:35 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
Try to be as strict as possible when matching. Identify all the fields in the logs, find the common patterns in them and match as much as you can. Your regex should validate data where necessary.
For example, using the wildcard .
to match huge portions of the lines reduces the quality of the regex.
Of course, you can use wildcards. Just don’t use them excessively.
Bonus
- Consider the following log:
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:474)
at com.databricks.backend.daemon.data.client.DBFSV2.initialize(DatabricksFileSystemV2.scala:64)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.initialize(DatabricksFileSystem.scala:222)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
Write a sed one-liner that will show stack traces lines in the following fashion:Exception occured inside method `org.apache.hadoop.fs.FileSystem$Cache.getInternal` from file `FileSystem.java` on line `2703`. The file was written in `java`.
Called method org.apache.hadoop.fs.FileSystem$Cache.getInternal which calls line 2703 of file FileSystem.java. The file is written in java.
HINT: sed capture groups are extra useful here
Lab 4: Text filtering editors
Copy the following files from /etc to your home root folder
Part 1: Grep
The grep command searches for lines matching a pattern and prints the matching lines to output.
passwd
file. Sample output:-v
option.It is also necessary in some cases to print the lines before or after a match.
-C
option to print 5 lines before and after a match.-P
option to use PCRE (Perl Compatible Regular Expression).regextest.txt
and try to match all the fields.Part 2: AWK
AWK is a language designed for text processing and typically used as a data extraction and reporting tool. It can be used like sed and grep to filter data with additional capabilities. It is a standard feature of most Unix-like operating systems.
gsub
method to substiture all occurrences of systemd:
as the delimiterPart 3: SED
The sed command (short for stream editor) performs editing operation on text coming from standard input or file. The sed command can be used like grep but it has more functionalities.
Sed by default will output the entire content of the file even when there is a match.
Now, let’s use sed like grep. To print only the lines that match, we add the
-n
option.Sed can substitute a matched pattern with another string before an output is displayed. It follows the structure
s/pattern/replacement/
.In the output, replace “systemd” with “NEWSYSTEMD”
In case you want to output all contents of the file to another file while replacing “systemd”, you can remove the
-n
option. Analyse the output from:We can restrict sed to perform it’s operation on a specific line number. In the example below, we restrict sed to line 1.
We can specify a range of line numbers.
We can also specify the line number where the match should start from. Sed will terminate when the first match is found.
Search for the word “sda” and replace it with “hda” globally (s/regexp/replacement/g), when the line contains the key “efi” (/regexp/) in the file fstab
Search for the word “:” and replace it with “;” when the line contains the key “root” in file passwd
Create the following file called unique with the following content
Delete line 2 and 3 from the file unique
Delete all line that starts with “This”
Questions to answer
Save the following lines to a file
server-data.log
.server-data.log
. Show how you can do this with grep and awk.xxx.xxx.xxx.xxx/xx
and save the output to a filenewlog.log
. Show the output.server-data.log
. Show the full command and regex used. Try to be as strict as possible when matching. Identify all the fields in the logs, find the common patterns in them and match as much as you can. Your regex should validate data where necessary.For example, using the wildcard
.
to match huge portions of the lines reduces the quality of the regex.Bonus