117 lines
		
	
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			117 lines
		
	
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						||
tags:
 | 
						||
  - shell
 | 
						||
---
 | 
						||
 | 
						||
# Text manipulation
 | 
						||
 | 
						||
## Sorting strings: `sort`
 | 
						||
 | 
						||
If you have a `.txt` file containing text strings, each on a new line you can
 | 
						||
use the sort function to quickly put them in alphabetical order:
 | 
						||
 | 
						||
```bash
 | 
						||
sort file.txt
 | 
						||
```
 | 
						||
 | 
						||
Note that this will not save the sort, it only presents it as a standard output.
 | 
						||
To save the sort you need to direct the sort to a file in the standard way:
 | 
						||
 | 
						||
```bash
 | 
						||
sort file.txt > output.txt
 | 
						||
```
 | 
						||
 | 
						||
### Options
 | 
						||
 | 
						||
- `-r`
 | 
						||
  - reverse sort
 | 
						||
- `c`
 | 
						||
  - check if file is already sorted. If not, it will highlight the strings which
 | 
						||
    are not sorted
 | 
						||
 | 
						||
## Find and replace: `sed`
 | 
						||
 | 
						||
The `sed` programme can be used to implement find and replace procedures. In
 | 
						||
`sed`, find and replace are covered by the substitution option: `/s` :
 | 
						||
 | 
						||
```bash
 | 
						||
sed ‘s/word/replacement word/’ file.txt
 | 
						||
```
 | 
						||
 | 
						||
This however will only change the first instance of word to be replaced, in
 | 
						||
order to apply to every instance you need to add the global option: `-g` .
 | 
						||
 | 
						||
As sed is a stream editor, any changes you make using it, will only occur within
 | 
						||
the standard output , they will not be saved to file. In order to save to file
 | 
						||
you need to specify a new file output (using `> output.txt`) in addition to the
 | 
						||
original file. This hasthe benefit of leaving the original file untouched whilst
 | 
						||
ensuring the desired outcome is stored permanently.
 | 
						||
 | 
						||
Alternatively, you can use the `-i` option which will make the changes take
 | 
						||
place in the source file as well as in standard input.
 | 
						||
 | 
						||
Note that this will overwrite the original version of the file and it cannot be
 | 
						||
regained. If this is an issue then it is recommended to include a backup command
 | 
						||
in the overall argument like so:
 | 
						||
 | 
						||
```bash
 | 
						||
sed -i.bak ‘s/word/replacement word/’ file.txt
 | 
						||
```
 | 
						||
 | 
						||
This will create the file `file.txt.bak` in the directory you are working within
 | 
						||
which is the original file before the replacement was carried out.
 | 
						||
 | 
						||
### Remove duplicates
 | 
						||
 | 
						||
We can use the `sort -u` command can be used to remove duplicates:
 | 
						||
 | 
						||
```bash
 | 
						||
sort -u file.txt
 | 
						||
```
 | 
						||
 | 
						||
It is important to sort before attempting to remove duplicates since the `-u`
 | 
						||
flag works on the basis of the strings being adjacent.
 | 
						||
 | 
						||
## Split a large file into multiple smaller files: `split`
 | 
						||
 | 
						||
Suppose you have a file containing 1000 lines. You want to break the file up
 | 
						||
into five separate files, each containing two hundred lines. You can use `split`
 | 
						||
to accomplish this, like so:
 | 
						||
 | 
						||
```bash
 | 
						||
split -l 200 big-file.txt new-files
 | 
						||
```
 | 
						||
 | 
						||
`split` will categorise the resulting five files as follows:
 | 
						||
 | 
						||
- new-file-aa,
 | 
						||
- new-file-ab
 | 
						||
- new-file-ac,
 | 
						||
- newfile-ad,
 | 
						||
- new-file-ae.
 | 
						||
 | 
						||
If you would rather have numeric suffixes, use the option `-d` . You can also
 | 
						||
split a file by its number of bytes, using the option `-b` and specifying a
 | 
						||
constituent file size.
 | 
						||
 | 
						||
## Merge multiple files into one with `cat`
 | 
						||
 | 
						||
We can use `cat` read multiple files at once and then append a redirect to save
 | 
						||
them to a file:
 | 
						||
 | 
						||
```bash
 | 
						||
cat file_a.txt file_b.txt file_c.txt > merged-file.txt
 | 
						||
```
 | 
						||
 | 
						||
## Count lines, words, etc: `wc`
 | 
						||
 | 
						||
To count words:
 | 
						||
 | 
						||
```bash
 | 
						||
wc file.txt
 | 
						||
```
 | 
						||
 | 
						||
When we use the command three numbers are outputted, in order: lines, words,
 | 
						||
bytes.
 | 
						||
 | 
						||
You can use modifiers to get just one of the numbers: `-l`, `-w` , `-b` .
 |