Part 2: Reading & Searching - Master Text Processing
Table of Contents
Learn to view, search, and process text files using the Linux command line In Part 1, you learned to navigate and create files. Now let’s learn to read them and search through them like a pro. The simplest way to view a file. Dumps the entire content to your screen. Exercise 2.1: When to use: Small files (< 100 lines). For large files, use A “pager” that lets you scroll through files without loading everything into memory. Navigation inside Exercise 2.2: View the beginning of a file. Exercise 2.3: View the end of a file. Super useful for log files! Exercise 2.4: Real-world use case: Count lines, words, and characters. Exercise 2.5: The most powerful search tool. Find lines matching a pattern. Basic usage: Important flags: Exercise 2.6: Real-world use case: Search for files by name, type, size, etc. Exercise 2.7: Combining find + grep: Pipes connect commands. The output of one becomes the input of the next. Example workflows: Exercise 2.8: Exercise 2.9: Exercise 2.10: Note: File must be sorted first! Exercise 2.11: Exercise 2.12: Exercise 2.13: Ready to practice? Complete the hands-on challenge in the workshop repository: Challenge: Analyze real web server logs to extract insights using grep, pipes, and text processing tools. The challenge includes: Clone the repository and give it a try:Navigation
Viewing File Contents
cat - Concatenate and Displaycat file.txt # Display file
cat file1.txt file2.txt # Display multiple files
cat file.txt > newfile.txt # Redirect output to new file
cat file1.txt file2.txt > combined.txt # Combine filescd ~/workshop-practice
echo "Hello, Linux!" > greeting.txt
cat greeting.txtless.less - View Large Filesless large_file.txtless:SPACE or f - Next pageb - Previous page/searchterm - Search forward?searchterm - Search backwardn - Next search resultN - Previous search resultg - Go to startG - Go to endq - Quit# Create a large file
seq 1 1000 > numbers.txt
less numbers.txt
# Try: Search for "500" by typing /500
# Press 'q' to quithead - First Lineshead file.txt # First 10 lines (default)
head -n 20 file.txt # First 20 lines
head -5 file.txt # First 5 lineshead -20 numbers.txt
head -3 numbers.txttail - Last Linestail file.txt # Last 10 lines (default)
tail -n 50 file.txt # Last 50 lines
tail -f logfile.log # Follow mode (live updates!)tail -20 numbers.txt
tail -5 numbers.txttail -f /var/log/nginx/access.log to watch web server requests in real-time.wc - Word Countwc file.txt # Shows: lines words characters
wc -l file.txt # Just line count
wc -w file.txt # Just word count
wc -c file.txt # Just character countwc numbers.txt
wc -l numbers.txt
echo "Hello World" | wc -wSearching Through Text
grep - Global Regular Expression Printgrep "search_term" file.txt
grep "error" /var/log/sysloggrep -i "error" file.txt # Case-insensitive
grep -n "error" file.txt # Show line numbers
grep -v "error" file.txt # Invert (show NON-matching lines)
grep -r "TODO" . # Recursive (search in all files)
grep -c "error" file.txt # Count matches
grep -A 3 "error" file.txt # Show 3 lines After match
grep -B 3 "error" file.txt # Show 3 lines Before match
grep -C 3 "error" file.txt # Show 3 lines of Contextcd ~/workshop-practice
# Create a sample log file
cat > app.log << EOF
2024-01-15 10:23:45 INFO Server started
2024-01-15 10:24:12 ERROR Failed to connect to database
2024-01-15 10:24:15 WARNING Retrying connection
2024-01-15 10:24:20 INFO Connected successfully
2024-01-15 10:25:01 ERROR Timeout on request
2024-01-15 10:25:05 INFO Request completed
EOF
# Now search:
grep "ERROR" app.log
grep -i "error" app.log # Try different case
grep -n "ERROR" app.log # With line numbers
grep -v "ERROR" app.log # Everything except errors
grep -c "INFO" app.log # Count INFO messages
grep -A 1 "ERROR" app.log # Show line after each errorgrep -r "api_key" . to find hardcoded secrets in your codebase.find - Find Filesfind . -name "*.txt" # Find all .txt files
find . -name "*.txt" -type f # Only files (not directories)
find . -name "test*" # Files starting with "test"
find . -iname "readme.md" # Case-insensitive name
find . -type d -name "src" # Find directories named "src"
find . -size +10M # Files larger than 10MB
find . -mtime -7 # Modified in last 7 dayscd ~/workshop-practice
# Create some test files
mkdir -p test1 test2 backup
touch test1/notes.txt test2/data.txt backup/old.txt
touch README.md readme.txt
# Now find:
find . -name "*.txt"
find . -iname "readme*"
find . -type d
find . -name "*test*"find . -name "*.py" -exec grep "TODO" {} \;
# Translation: Find all .py files and search for "TODO" in themThe Power of Pipes (
|)command1 | command2 | command3# Count how many .txt files exist
find . -name "*.txt" | wc -l
# Find the 5 largest files
ls -lh | sort -k5 -h | tail -5
# Search logs and count errors
cat app.log | grep "ERROR" | wc -l
# Get unique error messages
grep "ERROR" app.log | cut -d' ' -f5- | sort | uniqcd ~/workshop-practice
# How many lines in numbers.txt contain "5"?
grep "5" numbers.txt | wc -l
# Show me just the ERROR lines from app.log, sorted
grep "ERROR" app.log | sort
# Find all .txt files and show their line counts
find . -name "*.txt" -exec wc -l {} \;Redirection
Output Redirection
command > file.txt # Overwrite file
command >> file.txt # Append to file
command 2> error.log # Redirect errors only
command &> all.log # Redirect stdout AND stderrcd ~/workshop-practice
# Create a file
echo "First line" > output.txt
cat output.txt
# Append to it
echo "Second line" >> output.txt
cat output.txt
# Redirect command output
ls -la > directory_listing.txt
cat directory_listing.txtInput Redirection
command < input.txt # Use file as input
wc -l < file.txt # Count lines from fileAdvanced Text Processing
sort - Sort Linessort file.txt # Alphabetical sort
sort -r file.txt # Reverse sort
sort -n file.txt # Numerical sort
sort -u file.txt # Unique (remove duplicates)
sort -k2 file.txt # Sort by 2nd columncat > fruits.txt << EOF
banana
apple
cherry
apple
banana
EOF
sort fruits.txt
sort -u fruits.txt # Remove duplicates
sort -r fruits.txt # Reverseuniq - Remove Duplicatessort file.txt | uniq # Remove duplicates
sort file.txt | uniq -c # Count occurrences
sort file.txt | uniq -d # Show only duplicatessort fruits.txt | uniq
sort fruits.txt | uniq -ccut - Extract Columnscut -d' ' -f1 file.txt # Extract 1st field (space delimiter)
cut -d',' -f2,3 file.txt # Extract 2nd and 3rd fields (CSV)
cut -c1-10 file.txt # Extract characters 1-10cat > data.csv << EOF
John,25,Engineer
Jane,30,Designer
Bob,28,Developer
EOF
cut -d',' -f1 data.csv # Just names
cut -d',' -f2,3 data.csv # Age and jobtr - Translate Characterstr 'a-z' 'A-Z' # Lowercase to uppercase
tr -d ' ' # Delete spaces
tr -s ' ' # Squeeze repeated spacesecho "hello world" | tr 'a-z' 'A-Z'
echo "hello world" | tr -s ' '
echo "hello world 123" | tr -d '0-9'Hands-On Challenge
git clone https://github.com/bakayu/linux-tutorial.git
cd linux-tutorial/02-reading-searching
./setup.sh
cat README.mdQuick Reference Card
# View Files
cat file.txt # Dump entire file
less file.txt # Page through file
head -20 file.txt # First 20 lines
tail -20 file.txt # Last 20 lines
tail -f log.txt # Follow live updates
# Search
grep "pattern" file # Find matching lines
grep -i "pattern" file # Case-insensitive
grep -r "pattern" . # Recursive search
grep -n "pattern" file # Show line numbers
find . -name "*.txt" # Find files by name
# Process Text
wc -l file.txt # Count lines
sort file.txt # Sort lines
uniq file.txt # Remove duplicates
cut -d',' -f1 file.txt # Extract column
tr 'a-z' 'A-Z' # Transform characters
# Pipes & Redirection
cmd1 | cmd2 # Pipe output
cmd > file # Write to file
cmd >> file # Append to file
cmd < file # Input from file