Lab 4: Text filtering editors

Copy the following files from /etc to your home root folder

$ cp /etc/fstab ~
$ cp /etc/passwd ~

Part 1: Grep

The grep command searches for lines matching a pattern and prints the matching lines to output.

It is also necessary in some cases to print the lines before or after a match.

Regex cheat sheet: https://quickref.me/grep

Part 2: AWK

AWK is a language designed for text processing and typically used as a data extraction and reporting tool. It can be used like sed and grep to filter data with additional capabilities. It is a standard feature of most Unix-like operating systems.

Part 3: SED

The sed command (short for stream editor) performs editing operation on text coming from standard input or file. The sed command can be used like grep but it has more functionalities.

Questions to answer

Save the following lines to a file server-data.log.

2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.410.15.0/24'
2022/09/18 13:25:34 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:35 wazuh-remoted: WARNING: Remote syslog not parsed from: '10.110.18.0/24'
2022/09/18 13:25:35 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
Log1 2022/09/18 13:25:35 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
2022/09/18 13:25:35 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24' END
2022/09/18 13:25:35 wazuh-remoted: ACTION: none INFO: Remote syslog allowed from: '10.110.15.0/24'

The following tasks are to be completed with either grep, sed, or awk.
All actions are to be performed on server-data.log

  1. View only error and warning messages in server-data.log. Show how you can do this with grep and awk.
  2. View every line except lines with informational messages.
  3. Count how many error messages are in the log.
  4. Hide the IP addresses. Replace all IP addresses with xxx.xxx.xxx.xxx/xx and save the output to a file newlog.log. Show the output.

    This simulates a scenario where you want to send your logs to a third-party and you need to hide some information in the log messages.

  5. Write a single regular expression to match the following lines in server-data.log. Show the full command and regex used.
    2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
    2022/09/18 13:25:34 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
    2022/09/18 13:25:34 wazuh-remoted: INFO: Remote syslog allowed from: '10.110.15.0/24'
    2022/09/18 13:25:35 wazuh-remoted: WARNING: Remote syslog not parsed from: '10.110.18.0/24'
    2022/09/18 13:25:35 wazuh-remoted: ERROR: Remote syslog blocked from: '10.110.18.0/24'
    
    Try to be as strict as possible when matching. Identify all the fields in the logs, find the common patterns in them and match as much as you can. Your regex should validate data where necessary.
    For example, using the wildcard . to match huge portions of the lines reduces the quality of the regex.

    Of course, you can use wildcards. Just don’t use them excessively.

Bonus

  1. Consider the following log:
    at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:474)
    at com.databricks.backend.daemon.data.client.DBFSV2.initialize(DatabricksFileSystemV2.scala:64)
    at com.databricks.backend.daemon.data.client.DatabricksFileSystem.initialize(DatabricksFileSystem.scala:222)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    
    Write a sed one-liner that will show stack traces lines in the following fashion:
    Exception occured inside method `org.apache.hadoop.fs.FileSystem$Cache.getInternal` from file `FileSystem.java` on line `2703`. The file was written in `java`.
    
    Called method org.apache.hadoop.fs.FileSystem$Cache.getInternal which calls line 2703 of file FileSystem.java. The file is written in java.
    
    HINT: sed capture groups are extra useful here