Otto background

Linux Hack of the Week #7: SED and AWK

If you deal with data, at some point you will need to transform and modify your inputs. Sometimes, you might get data that is tab or pipe delimited and you will need it as a .csv. Other times, you will get a wonky data source with 12 leading white spaces. SED and AWK are great tools for making the transformation of inputs less of a headache. I mostly use SED to replace text, and AWK to format text.

Stream editing with SED

SED is a stream editor, it takes input as a file or a pipe. Many new SED users spend all their time making changes to a file, only to look at it later and see no evidence of changes. It’s important to remember that you will either use -i to make changes “in place”, or use >> to write a new file.

As a good first example, let’s replace every occurrence of  “or” in Hamlet’s “to be or not to be” speech with “maybe”. To do that, we will use ‘s’ for substitute in SED.

joe$ head -1 2b.txt

To be, or not to be, that is the question:

joe$ sed -e 's/or/maybe/' 2b.txt | head -1

To be, maybe not to be, that is the question:

 

What if we wanted to change “be” to “exist”? Since it is on the line more than once, we need to add a flag. We would use the ‘g’ option for global replacement.

joe$ sed -e 's/be/exist/' 2b.txt | head -1

To exist, or not to exist, that is the question:

 

What if we wanted to replace all comma characters with pipes? You can use the same command as above.. Add the -i option to edit the file in place. Since we are using -i there will be no output. But we can use head on the file to see the changes.

joe$ sed -i -e 's/,/|/' 2b.txt

joe$ head 2b.txt

To be| or not to be, that is the question:

Whether 'tis nobler in the mind to suffer

The slings and arrows of outrageous fortune|

Or to take Arms against a Sea of troubles|

And by opposing end them: to die| to sleep

No more; and by a sleep| to say we end

the heart-ache| and the thousand natural shocks

that Flesh is heir to? 'Tis a consummation

devoutly to be wished. To die| to sleep,

To sleep| perchance to Dream; aye, there's the rub


Manipulating text with AWK

AWK is a text manipulation tool that can be used to modify, slice, dice and chop text. It is great for transforming text and printing it in the format you want. Let’s say you want to change the format of this state population .csv:

https://raw.githubusercontent.com/BuzzFeedNews/2015-11-refugees-in-the-united-states/master/data/census-state-populations.csv

state,pop_est_2014
Alabama,4849377
Alaska,736732
Arizona,6731484
Arkansas,2966369
California,38802500
Colorado,5355866

Your challenge is to print them out with the text “State Name: stateName | State Population: StatePop”. Since AWK separates fields by spaces, you have to be a bit tricky to account for states with spaces in the names. We tell AWK to use a field separator of comma with -F. Then, you’ll notice the file has a header which we don’t want to print. For that we use a RegEx matching capital letters [A-Z], as the header is in all lowercase and finally we print using $1 and $2 (the two fields in the .csv).

joe$ awk -F, ' /[A-Z]/ { print "State Name: " $1 " | State Population: " $2  } ' census-state-populations.csv

State Name: Alabama | State Population: 4849377

State Name: Alaska | State Population: 736732

State Name: Arizona | State Population: 6731484

State Name: Arkansas | State Population: 2966369

State Name: California | State Population: 38802500

State Name: Colorado | State Population: 5355866

 

This is just the tip of the iceberg! Want to learn more about AWK? Check out this manual from GNU: https://www.gnu.org/software/gawk/manual/gawk.html.

I hope this helps spark your interest in leveraging SED and AWK.


Automox for Easy IT Operations

Automox is the cloud-native IT operations platform for modern organizations. It makes it easy to keep every endpoint automatically configured, patched, and secured – anywhere in the world. With the push of a button, IT admins can fix critical vulnerabilities faster, slash cost and complexity, and win back hours in their day. 

Grab your free trial of Automox and join thousands of companies transforming IT operations into a strategic business driver.

Dive deeper into this topic

loading...