If you deal with data, at some point you will need to transform and modify your inputs. Sometimes, you might get data that is tab or pipe delimited and you will need it as a .csv. Other times, you will get a wonky data source with 12 leading white spaces. SED and AWK are great tools for making the transformation of inputs less of a headache. I mostly use SED to replace text, and AWK to format text.
Stream editing with SED
SED is a stream editor, it takes input as a file or a pipe. Many new SED users spend all their time making changes to a file, only to look at it later and see no evidence of changes. It’s important to remember that you will either use -i to make changes “in place”, or use >> to write a new file.
As a good first example, let’s replace every occurrence of “or” in Hamlet’s “to be or not to be” speech with “maybe”. To do that, we will use ‘s’ for substitute in SED.
joe$ head -1 2b.txt To be, or not to be, that is the question: joe$ sed -e 's/or/maybe/' 2b.txt | head -1 To be, maybe not to be, that is the question:
|
What if we wanted to change “be” to “exist”? Since it is on the line more than once, we need to add a flag. We would use the ‘g’ option for global replacement.
joe$ sed -e 's/be/exist/' 2b.txt | head -1 To exist, or not to exist, that is the question:
|
What if we wanted to replace all comma characters with pipes? You can use the same command as above.. Add the -i option to edit the file in place. Since we are using -i there will be no output. But we can use head on the file to see the changes.
joe$ sed -i -e 's/,/|/' 2b.txt joe$ head 2b.txt To be| or not to be, that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune| Or to take Arms against a Sea of troubles| And by opposing end them: to die| to sleep No more; and by a sleep| to say we end the heart-ache| and the thousand natural shocks that Flesh is heir to? 'Tis a consummation devoutly to be wished. To die| to sleep, To sleep| perchance to Dream; aye, there's the rub |
Manipulating text with AWK
AWK is a text manipulation tool that can be used to modify, slice, dice and chop text. It is great for transforming text and printing it in the format you want. Let’s say you want to change the format of this state population .csv:
state,pop_est_2014 |
Your challenge is to print them out with the text “State Name: stateName | State Population: StatePop”. Since AWK separates fields by spaces, you have to be a bit tricky to account for states with spaces in the names. We tell AWK to use a field separator of comma with -F. Then, you’ll notice the file has a header which we don’t want to print. For that we use a RegEx matching capital letters [A-Z], as the header is in all lowercase and finally we print using $1 and $2 (the two fields in the .csv).
joe$ awk -F, ' /[A-Z]/ { print "State Name: " $1 " | State Population: " $2 } ' census-state-populations.csv State Name: Alabama | State Population: 4849377 State Name: Alaska | State Population: 736732 State Name: Arizona | State Population: 6731484 State Name: Arkansas | State Population: 2966369 State Name: California | State Population: 38802500 State Name: Colorado | State Population: 5355866
|
This is just the tip of the iceberg! Want to learn more about AWK? Check out this manual from GNU: https://www.gnu.org/software/gawk/manual/gawk.html.
I hope this helps spark your interest in leveraging SED and AWK.
Automox for Easy IT Operations
Automox is the cloud-native IT operations platform for modern organizations. It makes it easy to keep every endpoint automatically configured, patched, and secured – anywhere in the world. With the push of a button, IT admins can fix critical vulnerabilities faster, slash cost and complexity, and win back hours in their day.
Grab your free trial of Automox and join thousands of companies transforming IT operations into a strategic business driver.