Common Bash programming errors (continued)

Original author: Greg Wooledge
  • Transfer
I continue to introduce the community to the translation of Bash Pitfalls .
Part One .
Initial publication of the translation .

11. cat file | sed s / foo / bar /> file

You cannot read from a file and write to it in the same pipeline. Depending on how the pipeline is built, the file can be zeroed out (or it may be truncated to the size equal to the buffer allocated by the operating system for the pipeline), or it can grow indefinitely until it occupies all available disk space, or reaches restrictions on the size of the file specified by the operating system or quota, etc.

If you want to make a change to a file other than adding data to its end, you must create a temporary file at some intermediate point. For example (this code works in all shells):

 sed 's / foo / bar / g' file> tmpfile && mv tmpfile file

The following snippet will only work when using GNU sed 4.x and higher:

 sed -i 's / foo / bar / g' file

Please note that this also creates a temporary file and then renames it - it just does it unnoticed.

In the BSD version of sed, you must specify the extension that is added to the backup copy of the file. If you are confident in your script, you can specify a zero extension:

 sed -i '' 's / foo / bar / g' file

You can also use perl 5.x, which is probably more common than sed 4.x:

 perl -pi -e 's / foo / bar / g' file

Various aspects of the task of mass replacing strings in a heap of files are discussed in the Bash FAQ # 21 .

12. echo $ foo

This relatively innocent looking team can lead to unpleasant consequences. Since the variable is $foo not enclosed in quotation marks, it will not only be divided into words, but also the template contained in it will be converted into the names of the files matching it. Because of this, bash programmers sometimes mistakenly think that their variables contain incorrect values, whereas everything is fine with the variables - this command echo displays them according to the bash logic, which leads to misunderstandings.

 MSG = "Please enter a file name of the form * .zip"  
echo $ msg

This message is broken into words and all patterns, such as *.zip , are revealed. What users of your script will think when they see the phrase:

 Please enter a file name of the form

Here is another example:

 VAR = *. Zip # VAR contains an asterisk, dot, and the word "zip"  
echo "$ VAR" # will output * .zip  
echo $ VAR # will list the files whose names end in .zip

In fact, the echo command cannot be used absolutely safely at all. If the variable contains only two "-n" characters, the command echo will consider them as an option, and not as the data that needs to be printed, and absolutely nothing will be output. The only reliable way to print the value of a variable - use the command printf : . br@0/> printf "%s\n" "$foo" .

13. $ foo = bar

No, you cannot create a variable by putting "$" at the beginning of its name. This is not Perl. It is enough to write:

 foo = bar

14. foo = bar

No, you cannot leave spaces around "=" by assigning a value to a variable. This is not C. When you write foo = bar , the shell breaks it down into three words, the first of which, foo is taken as the name of the team, and the remaining two as its arguments.

For the same reason, the following expressions are also incorrect:

 foo = bar # WRONG!  
foo = bar # WRONG!  
$ foo = bar # ABSOLUTELY WRONG!

 foo = bar # Right.

15. echo << EOF

Embedded documents are useful for embedding large blocks of text data in a script. When the interpreter encounters a similar construct, it directs the lines up to the specified marker (in this case - EOF ) to the input stream of the command. Unfortunately, echo does not accept data from STDIN.

 # Wrong:  
echo << EOF Hello world EOF

 # Right:  
cat << EOF Hello world EOF

16. su -c 'some command'

On Linux, this syntax is correct and will not cause an error. The problem is that on some systems (like FreeBSD or Solaris), the -c command argument su has a completely different purpose. In particular, in FreeBSD, the key -c indicates the class whose restrictions apply when the command is executed, and shell arguments must be specified after the name of the target user. If the username is missing, the option -c will refer to the su command, and not to the new shell. Therefore, it is recommended that you always specify the name of the target user, regardless of the system (who knows which platforms your scripts will run on ...):

 su root -c 'some command' # Correct.

17. cd / foo; bar

If you do not check the result of execution cd , in case of an error, the command bar may be executed in the wrong directory where the developer intended. This can lead to disaster if it bar contains something like rm * .

Therefore, you should always check the return code for the cd command. The simplest way:

 cd / foo && bar

If cd is followed by more than one command, you can write this:

cd §foo || exit 1 bar  
bat ... # Lots of teams.

cd will report a directory change error message in stderr view bash: cd: /foo: No such file or directory . If you want to display your error message in stdout, you should use grouping of commands:

cd §net ||  {echo "Can't read / net. Make sure you've logged in to the Samba network, and try again.";   exit 1;  }  

Note the space between { and echo , as well as the semicolon before the closing } .

Some add a command to the top of the script set -e so that their scripts are interrupted after each command that returns a non-zero value, but this trick should be used with great care, as many common commands can return a non-zero value as a simple warning message (warning), and it’s not necessary to consider errors such as critical.

By the way, if you work a lot with directories in a bash script, re-read man bash in the places related to commands pushd , popd and dirs . Maybe all your code, stuffed cd and pwd simply do not need :).

Let's get back to our sheep. Compare this snippet:

find ... -type d | while read subdir; do  
    cd "$ subdir" && whatever && ... && cd - done

with this:

find ... -type d | while read subdir; do  
    (cd "$ subdir" && whatever && ...) done

Forcing a subshell calls cd and subsequent commands to execute in a subshell; in the next iteration of the cycle, we will return to the initial location, regardless of whether the directory change was successful or if it ended with an error. We do not need to return manually.

In addition, the penultimate example contains another error: if one of the commands whatever fails, we may not go back to the initial directory. To fix this without using a sub-shell, at the end of each iteration you will have to do something like this cd "$ORIGINAL_DIR" , and this will add a little more confusion to your scripts.

18. [ bar == "$foo" ]

The operator is == not an argument to the command [ . Use instead = or replace with the [ keyword [[ :

[ bar = "$foo" ] && echo yes [[ bar == $foo ]] && echo yes

19. for i in {1..10}; do ./something &; done

You cannot put a semicolon ";" right after &. Just remove this extra character:

 for i in {1..10};  do ./something & done

The & symbol in itself is a sign of the end of a command, just like ";" and line feed. You can not put them one after another.

20. cmd1 && cmd2 || cmd3

Many prefer to use && and || as a shorthand for if ... then ... else ... fi . In some cases, it is absolutely safe:

[[ -s $errorlog ]] && echo "Uh oh, there were some errors." || echo "Successful."

Однако в общем случае эта конструкция не может служить полным эквивалентом if ... fi , потому что команда cmd2 перед && также может генерировать код возврата, и если этот код не 0 , будет выполнена команда, следующая за ||. A simple example that can lead many to a state of stupor:

i=0 true && ((i++)) || ((i--))  
echo $ i # will print 0

What happened here? In principle, the variable i should take the value 1, but at the end of the script it contains 0. That is, both i ++ and i-- commands are executed sequentially. The command ((i ++)) returns a number that is the result of executing an expression in parentheses in the C style. The value of this expression is 0 (the initial value is i), but in C an expression with an integer value of 0 is considered false. Therefore, the expression ((i ++)), where i is 0, returns 1 (false) and the command ((i--)) is executed.

This would not happen if we used the pre-increment operator, since in this case the return code ++ i is true:

i=0 true && (( ++i )) ||  ((--i))  
echo $ i # prints 1

But we were just lucky and our code works exclusively for a "random" combination of circumstances. Therefore, you can’t rely on x && y || z if there is the slightest chance that it y will return false (the last code fragment will be executed with an error if i will be -1 instead of 0)
If you need security, or you doubt the mechanisms that make your code work, or you do not understand anything in the previous paragraphs, it is better not to be lazy and write if ... fi in your scripts:

if true;  then ((i++))  
else ((i--)) fi  
echo $ i # will print 1.  

Bourne shell this also applies:

  # Both blocks of commands are executed:  
$ true && {echo true;  false; } ||  {echo false;  true; } true false

21. Regarding UTF-8 and BOM (Byte-Order Mark, byte order mark)

In general: on Unix, UTF-8 encoded texts do not use byte order marks. The encoding of the text is determined by the locale, the mime type of the file, or by some other metadata. Although the presence of a BOM will not spoil a UTF-8 document in terms of its human readability, problems may arise with the automatic interpretation of such files as scripts, source codes, configuration files, etc. Files starting with BOM should be treated as foreign, as well as files with DOS line breaks.

In shell scripts: “Where UTF-8 can be transparently used in 8-bit environments, the BOM will intersect with any protocol or file format that assumes the presence of ASCII characters at the beginning of the stream, for example, #! at the beginning of Unix shell scripts”