82 lines
No EOL
17 KiB
HTML
82 lines
No EOL
17 KiB
HTML
<!DOCTYPE html><html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type" /><meta content="width=device-width, initial-scale=1" name="viewport" /><!--replace-start-0--><!--replace-start-5--><!--replace-start-8--><title>Awk - My Zettelkasten</title><!--replace-end-8--><!--replace-end-5--><!--replace-end-0--><link href="https://cdn.jsdelivr.net/npm/fomantic-ui@2.8.7/dist/semantic.min.css" rel="stylesheet" /><link href="https://fonts.googleapis.com/css?family=Merriweather|Libre+Franklin|Roboto+Mono&display=swap" rel="stylesheet" /><!--replace-start-1--><!--replace-start-4--><!--replace-start-7--><link href="https://raw.githubusercontent.com/srid/neuron/master/assets/neuron.svg" rel="icon" /><meta content="Awk is a programming language designed for text processing and data extraction. It was created in the 1970s and remains widely used today for tasks such as filtering and transforming text data, generating reports, and performing basic calculations. Awk is known for its simplicity and versatility, ma" name="description" /><meta content="Awk" property="og:title" /><meta content="My Zettelkasten" property="og:site_name" /><meta content="article" property="og:type" /><meta content="Awk" property="neuron:zettel-id" /><meta content="Awk" property="neuron:zettel-slug" /><meta content="awk" property="neuron:zettel-tag" /><meta content="shell" property="neuron:zettel-tag" /><script type="application/ld+json">[]</script><style type="text/css">body{background-color:#eeeeee !important;font-family:"Libre Franklin", serif !important}body .ui.container{font-family:"Libre Franklin", serif !important}body h1, h2, h3, h4, h5, h6, .ui.header, .headerFont{font-family:"Merriweather", sans-serif !important}body code, pre, tt, .monoFont{font-family:"Roboto Mono","SFMono-Regular","Menlo","Monaco","Consolas","Liberation Mono","Courier New", monospace !important}body div.z-index p.info{color:#808080}body div.z-index ul{list-style-type:square;padding-left:1.5em}body div.z-index .uplinks{margin-left:0.29999em}body .zettel-content h1#title-h1{background-color:rgba(33,133,208,0.1)}body nav.bottomPane{background-color:rgba(33,133,208,2.0e-2)}body div#footnotes{border-top-color:#2185d0}body p{line-height:150%}body img{max-width:100%}body .deemphasized{font-size:0.94999em}body .deemphasized:hover{opacity:1}body .deemphasized:not(:hover){opacity:0.69999}body .deemphasized:not(:hover) a{color:#808080 !important}body div.container.universe{padding-top:1em}body div.zettel-view ul{padding-left:1.5em;list-style-type:square}body div.zettel-view .pandoc .highlight{background-color:#ffff00}body div.zettel-view .pandoc .ui.disabled.fitted.checkbox{margin-right:0.29999em;vertical-align:middle}body div.zettel-view .zettel-content .metadata{margin-top:1em}body div.zettel-view .zettel-content .metadata div.date{text-align:center;color:#808080}body div.zettel-view .zettel-content h1{padding-top:0.2em;padding-bottom:0.2em;text-align:center}body div.zettel-view .zettel-content h2{border-bottom:solid 1px #4682b4;margin-bottom:0.5em}body div.zettel-view .zettel-content h3{margin:0px 0px 0.4em 0px}body div.zettel-view .zettel-content h4{opacity:0.8}body div.zettel-view .zettel-content div#footnotes{margin-top:4em;border-top-style:groove;border-top-width:2px;font-size:0.9em}body div.zettel-view .zettel-content div#footnotes ol > li > p:only-of-type{display:inline;margin-right:0.5em}body div.zettel-view .zettel-content aside.footnote-inline{width:30%;padding-left:15px;margin-left:15px;float:right;background-color:#d3d3d3}body div.zettel-view .zettel-content .overflows{overflow:auto}body div.zettel-view .zettel-content code{margin:auto auto auto auto;font-size:100%}body div.zettel-view .zettel-content p code, li code, ol code{padding:0.2em 0.2em 0.2em 0.2em;background-color:#f5f2f0}body div.zettel-view .zettel-content pre{overflow:auto}body div.zettel-view .zettel-content dl dt{font-weight:bold}body div.zettel-view .zettel-content blockquote{background-color:#f9f9f9;border-left:solid 10px #cccccc;margin:1.5em 0px 1.5em 0px;padding:0.5em 10px 0.5em 10px}body div.zettel-view .zettel-content.raw{background-color:#dddddd}body .ui.label.zettel-tag{color:#000000}body .ui.label.zettel-tag a{color:#000000}body nav.bottomPane ul.backlinks > li{padding-bottom:0.4em;list-style-type:disc}body nav.bottomPane ul.context-list > li{list-style-type:lower-roman}body .footer-version img{-webkit-filter:grayscale(100%);-moz-filter:grayscale(100%);-ms-filter:grayscale(100%);-o-filter:grayscale(100%);filter:grayscale(100%)}body .footer-version img:hover{-webkit-filter:grayscale(0%);-moz-filter:grayscale(0%);-ms-filter:grayscale(0%);-o-filter:grayscale(0%);filter:grayscale(0%)}body .footer-version, .footer-version a, .footer-version a:visited{color:#808080}body .footer-version a{font-weight:bold}body .footer-version{margin-top:1em !important;font-size:0.69999em}@media only screen and (max-width: 768px){body div#zettel-container{margin-left:0.4em !important;margin-right:0.4em !important}}body span.zettel-link-container span.zettel-link a{color:#2185d0;font-weight:bold;text-decoration:none}body span.zettel-link-container span.zettel-link a:hover{background-color:rgba(33,133,208,0.1)}body span.zettel-link-container span.extra{color:auto}body span.zettel-link-container.errors{border:solid 1px #ff0000}body span.zettel-link-container.errors span.zettel-link a:hover{text-decoration:none !important;cursor:not-allowed}body [data-tooltip]:after{font-size:0.69999em}body div.tag-tree div.node{font-weight:bold}body div.tag-tree div.node a.inactive{color:#555555}body .tree.flipped{-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}body .tree{overflow:auto}body .tree ul.root{padding-top:0px;margin-top:0px}body .tree ul{position:relative;padding:1em 0px 0px 0px;white-space:nowrap;margin:0px auto 0px auto;text-align:center}body .tree ul::after{content:"";display:table;clear:both}body .tree ul:last-child{padding-bottom:0.1em}body .tree li{display:inline-block;vertical-align:top;text-align:center;list-style-type:none;position:relative;padding:1em 0.5em 0em 0.5em}body .tree li::before{content:"";position:absolute;top:0px;right:50%;border-top:solid 2px #cccccc;width:50%;height:1.19999em}body .tree li::after{content:"";position:absolute;top:0px;right:50%;border-top:solid 2px #cccccc;width:50%;height:1.19999em}body .tree li::after{right:auto;left:50%;border-left:solid 2px #cccccc}body .tree li:only-child{padding-top:0em}body .tree li:only-child::after{display:none}body .tree li:only-child::before{display:none}body .tree li:first-child::before{border-style:none;border-width:0px}body .tree li:first-child::after{border-radius:5px 0px 0px 0px}body .tree li:last-child::after{border-style:none;border-width:0px}body .tree li:last-child::before{border-right:solid 2px #cccccc;border-radius:0px 5px 0px 0px}body .tree ul ul::before{content:"";position:absolute;top:0px;left:50%;border-left:solid 2px #cccccc;width:0px;height:1.19999em}body .tree li div.forest-link{border:solid 2px #cccccc;padding:0.2em 0.29999em 0.2em 0.29999em;text-decoration:none;display:inline-block;border-radius:5px 5px 5px 5px;color:#333333;position:relative;top:2px}body .tree.flipped li div.forest-link{-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}</style><script
|
||
async=""
|
||
id="MathJax-script"
|
||
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"
|
||
></script>
|
||
<link
|
||
href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.23.0/themes/prism.min.css"
|
||
rel="stylesheet"
|
||
/><link rel="preconnect" href="https://fonts.googleapis.com" /><link
|
||
rel="preconnect"
|
||
href="https://fonts.gstatic.com"
|
||
crossorigin
|
||
/><link
|
||
href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;1,100;1,200;1,300;1,400;1,500;1,600;1,700&family=IBM+Plex+Sans+Condensed:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;1,100;1,200;1,300;1,400;1,500;1,600;1,700&family=IBM+Plex+Sans:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;1,100;1,200;1,300;1,400;1,500;1,600;1,700&family=IBM+Plex+Serif:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;1,100;1,200;1,300;1,400;1,500;1,600;1,700&display=swap"
|
||
rel="stylesheet"
|
||
/>
|
||
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.23.0/components/prism-core.min.js"></script>
|
||
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.23.0/plugins/autoloader/prism-autoloader.min.js"></script>
|
||
<style>
|
||
body .ui.container,
|
||
body ul {
|
||
font-family: "IBM Plex Sans" !important;
|
||
}
|
||
body blockquote {
|
||
border-left-width: 3px !important;
|
||
font-style: italic;
|
||
}
|
||
.headerFont,
|
||
.ui.header,
|
||
body h1,
|
||
h2,
|
||
h3,
|
||
h4,
|
||
h5,
|
||
h6 {
|
||
font-family: "IBM Plex Sans Condensed" !important;
|
||
}
|
||
body p {
|
||
line-height: 1.4;
|
||
}
|
||
.monoFont,
|
||
body code,
|
||
pre,
|
||
tt {
|
||
font-family: "IBM Plex Mono" !important;
|
||
font-size: 12px !important;
|
||
line-height: 1.4 !important;
|
||
}
|
||
</style>
|
||
<!--replace-end-7--><!--replace-end-4--><!--replace-end-1--></head><body><div class="ui fluid container universe"><!--replace-start-2--><!--replace-start-3--><!--replace-start-6--><div class="ui text container" id="zettel-container" style="position: relative"><div class="zettel-view"><article class="ui raised attached segment zettel-content"><div class="pandoc"><h1 id="title-h1">Awk</h1><blockquote><p>Awk is a programming language designed for text processing and data extraction. It was created in the 1970s and remains widely used today for tasks such as filtering and transforming text data, generating reports, and performing basic calculations. Awk is known for its simplicity and versatility, making it a popular tool for Unix system administrators and data analysts.</p></blockquote><h2 id="invocation">Invocation</h2><p>We can use <code>awk</code> directly in <code>stdin</code> or we can reference <code>.awk</code> files for more elaborate scripts</p><pre><code class="bash language-bash"># CLI
|
||
awk [program] file1, file2, file3
|
||
|
||
# Script file
|
||
awk -f [ref_to_script_file] file1, file2, file3</code></pre><p>We can also pipe to it. This piped command receives output from the <code>echo</code> command and prints the value in the last field for each record:</p><pre><code class="bash language-bash">echo -e "1 2 3 5\n2 2 3 8" | awk '{print $(NF)}'</code></pre><h2 id="syntactic-structure">Syntactic structure</h2><p><code>awk</code> is a line-oriented language.</p><p>An <code>awk</code> program consists in a sequence of <strong>pattern: action</strong> statements and optional functional definitions.</p><p>For most of the examples we will use this list as the input:</p><pre><code class="language-none">cloud
|
||
existence
|
||
ministerial
|
||
falcon
|
||
town
|
||
sky
|
||
top
|
||
bookworm
|
||
bookcase
|
||
war
|
||
Peter 89
|
||
Lucia 95
|
||
Thomas 76
|
||
Marta 67
|
||
Joe 92
|
||
Alex 78
|
||
Sophia 90
|
||
Alfred 65
|
||
Kate 46</code></pre><blockquote><p><code>awk</code> particularly lends itself to inputs that are structured by whitespace or in columns, like what you get from commands like <code>ls</code> and <code>grep</code></p></blockquote><h3 id="patterns-and-actions">Patterns and actions</h3><p>The basic structure of an <code>awk</code> script is as follows:</p><pre><code class="language-none">pattern {action}</code></pre><p>A <strong>pattern</strong> is what you want to match against. It can be a literal string or a regex. The <strong>action</strong> is what process you want to execute against the lines in the input that match the pattern.</p><p>The following script prints the line that matches <code>Joe</code>:</p><pre><code class="bash language-bash">awk '/Joe/ {print}' list.txt</code></pre><p><code>/Joe/</code> is the patttern and <code>{print}</code> is the action.</p><h3 id="lines-records-fields">Lines, records, fields</h3><p><img src="/static/awk-outline.png" /></p><p>When <code>awk</code> receives a file it divides the lines into <strong>records</strong>.</p><p>Each line <code>awk</code> receives is broken up into a sequence of <strong>fields</strong>.</p><p>The fields are accessed by special variables:</p><ul><li><p><code>$1</code> reads the first field, <code>$2</code> reads the second field and so on.</p></li><li><p>The variable <code>$0</code> refers to the whole record</p></li></ul><p>So, in the picture <code>cloud existence ministerial</code> corresponse to <code>$1</code> <code>$2</code> <code>$3</code></p><h2 id="basic-examples">Basic examples</h2><p><strong><em>Match a pattern</em></strong></p><pre><code class="bash language-bash">awk '/book/ { print }' list.txt
|
||
# bookworm
|
||
# bookcase</code></pre><p><strong><em>Print all words that are longer that five characters</em></strong></p><pre><code class="bash language-bash">awk 'length($1) > 5 { print $0 }' list.txt</code></pre><p>For the first field of every line (we only have one field per line), if it is greater than 5 characters print it. The “every line” part is provided for via the all fields variable - <code>$0</code>.</p><p>We actually don’t need to include the <code>{ print $0 }</code> action, as this is the default behaviour. We could have just put <code>length($1) > 5 list.txt</code></p><p><strong><em>Print all words that do not have three characters</em></strong></p><pre><code class="bash language-bash">awk '!(length($1) == 3)' list.txt</code></pre><p>Here we negate by prepending the pattern with <code>!</code> and wrapping it in parentheses.</p><p><strong><em>Return words that are either three characters or four characters in length</em></strong></p><pre><code class="language-none">awk '(length($1) == 3) || (length($1) == 4)' list.txt</code></pre><p>Here we use the logical OR to match against more than one pattern. Notice that whenever we use a Boolean operator such as NOT or OR, we wrap our pattern in parentheses.</p><p><strong><em>Match and string-interpolate the output</em></strong></p><pre><code class="bash language-bash">awk 'length($1) > 0 {print $1, "has", length($1), "chars"}' list.txt
|
||
|
||
# storeroom has 9 chars
|
||
# tree has 4 chars
|
||
# cup has 3 chars</code></pre><p><strong><em>Match against a numerical property</em></strong></p><pre><code class="bash language-bash">awk '$2 >= 90 { print $0 }' scores.txt
|
||
|
||
# Lucia 95
|
||
# Joe 92
|
||
# Sophia 90</code></pre><p>This returns the records where there is a secondary numerical field that is greater than 90.</p><p><strong><em>Match a field against a regular expression</em></strong></p><pre><code class="bash language-bash">awk '$1 ~ /^[b,c]/ {print $1}' words.txt</code></pre><p>This matches all the fields in the <code>$1</code> place that begin with ‘b’ or ‘c’.</p><p>The tilde is the regex match operator. You must be passing a regex to use it, otherwise use <code>==</code>.</p><h2 id="syntactic-shorthands">Syntactic shorthands</h2><ul><li>For a statement like <code>awk 'length($1) > 5 { print $0 }' list.txt</code>. We actually don’t need to include the <code>{ print $0 }</code> action, as this is the default behaviour and it is implied. We could have just put <code>length($1) > 5 list.txt</code>.</li></ul><p><a href="https://zetcode.com/lang/awk/">https://zetcode.com/lang/awk/</a></p><h2 id="built-in-variables">Built-in variables</h2><h3 id="nf"><code>NF</code></h3><p>The value of <code>NF</code> is the <strong>number</strong> of <strong>fields</strong> in the current record. <code>Awk</code> automatically updates the value of <code>NF</code> every time it reads a record.</p><p>No matter how many fields there are, the last value in a record can always be represented by <code>$NF</code>.</p><h3 id="nr"><code>NR</code></h3><p><code>NR</code> represents the <strong>number</strong> of <strong>records</strong>. It is set at the point at which the file is read.</p><h3 id="fs"><code>FS</code></h3><p><code>FS</code> represents the <strong>field separator</strong>. The default field separator is a space. We can specify a different separator with the <code>-F</code> flag. E.g to separate by comma:</p><pre><code class="bash language-bash">awk -F, '{print $1 }' list.txt</code></pre></div></article><nav class="ui attached segment deemphasized bottomPane" id="neuron-tags-pane"><div><span class="ui basic label zettel-tag" title="Tag">awk</span><span class="ui basic label zettel-tag" title="Tag">shell</span></div></nav><nav class="ui bottom attached icon compact inverted menu blue" id="neuron-nav-bar"><!--replace-start-9--><!--replace-end-9--><a class="right item" href="impulse.html" title="Open Impulse"><i class="wave square icon"></i></a></nav></div></div><!--replace-end-6--><!--replace-end-3--><!--replace-end-2--><div class="ui center aligned container footer-version"><div class="ui tiny image"><a href="https://neuron.zettel.page"><img alt="logo" src="https://raw.githubusercontent.com/srid/neuron/master/assets/neuron.svg" title="Generated by Neuron 1.9.35.3" /></a></div></div></div></body></html> |