117 lines
3.1 KiB
Markdown
117 lines
3.1 KiB
Markdown
---
|
||
tags:
|
||
- shell
|
||
---
|
||
|
||
# Text manipulation
|
||
|
||
## Sorting strings: `sort`
|
||
|
||
If you have a `.txt` file containing text strings, each on a new line you can
|
||
use the sort function to quickly put them in alphabetical order:
|
||
|
||
```bash
|
||
sort file.txt
|
||
```
|
||
|
||
Note that this will not save the sort, it only presents it as a standard output.
|
||
To save the sort you need to direct the sort to a file in the standard way:
|
||
|
||
```bash
|
||
sort file.txt > output.txt
|
||
```
|
||
|
||
### Options
|
||
|
||
- `-r`
|
||
- reverse sort
|
||
- `c`
|
||
- check if file is already sorted. If not, it will highlight the strings which
|
||
are not sorted
|
||
|
||
## Find and replace: `sed`
|
||
|
||
The `sed` programme can be used to implement find and replace procedures. In
|
||
`sed`, find and replace are covered by the substitution option: `/s` :
|
||
|
||
```bash
|
||
sed ‘s/word/replacement word/’ file.txt
|
||
```
|
||
|
||
This however will only change the first instance of word to be replaced, in
|
||
order to apply to every instance you need to add the global option: `-g` .
|
||
|
||
As sed is a stream editor, any changes you make using it, will only occur within
|
||
the standard output , they will not be saved to file. In order to save to file
|
||
you need to specify a new file output (using `> output.txt`) in addition to the
|
||
original file. This hasthe benefit of leaving the original file untouched whilst
|
||
ensuring the desired outcome is stored permanently.
|
||
|
||
Alternatively, you can use the `-i` option which will make the changes take
|
||
place in the source file as well as in standard input.
|
||
|
||
Note that this will overwrite the original version of the file and it cannot be
|
||
regained. If this is an issue then it is recommended to include a backup command
|
||
in the overall argument like so:
|
||
|
||
```bash
|
||
sed -i.bak ‘s/word/replacement word/’ file.txt
|
||
```
|
||
|
||
This will create the file `file.txt.bak` in the directory you are working within
|
||
which is the original file before the replacement was carried out.
|
||
|
||
### Remove duplicates
|
||
|
||
We can use the `sort -u` command can be used to remove duplicates:
|
||
|
||
```bash
|
||
sort -u file.txt
|
||
```
|
||
|
||
It is important to sort before attempting to remove duplicates since the `-u`
|
||
flag works on the basis of the strings being adjacent.
|
||
|
||
## Split a large file into multiple smaller files: `split`
|
||
|
||
Suppose you have a file containing 1000 lines. You want to break the file up
|
||
into five separate files, each containing two hundred lines. You can use `split`
|
||
to accomplish this, like so:
|
||
|
||
```bash
|
||
split -l 200 big-file.txt new-files
|
||
```
|
||
|
||
`split` will categorise the resulting five files as follows:
|
||
|
||
- new-file-aa,
|
||
- new-file-ab
|
||
- new-file-ac,
|
||
- newfile-ad,
|
||
- new-file-ae.
|
||
|
||
If you would rather have numeric suffixes, use the option `-d` . You can also
|
||
split a file by its number of bytes, using the option `-b` and specifying a
|
||
constituent file size.
|
||
|
||
## Merge multiple files into one with `cat`
|
||
|
||
We can use `cat` read multiple files at once and then append a redirect to save
|
||
them to a file:
|
||
|
||
```bash
|
||
cat file_a.txt file_b.txt file_c.txt > merged-file.txt
|
||
```
|
||
|
||
## Count lines, words, etc: `wc`
|
||
|
||
To count words:
|
||
|
||
```bash
|
||
wc file.txt
|
||
```
|
||
|
||
When we use the command three numbers are outputted, in order: lines, words,
|
||
bytes.
|
||
|
||
You can use modifiers to get just one of the numbers: `-l`, `-w` , `-b` .
|