In Earlier post "Count number of characters, words and lines in input", we have seen how to count number of characters,words and lines. Today, we will learn about pitfalls during word counting.
In K&R, we have a separate exercise for this. Exercise 1.14 states "How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?"
While writing this program earlier, we never handled much cases. There are caveats which needs to be identified and addressed.
First lets list what all possible checks we usually need while word counting.
1. Checking for very short words.
2. Check for very lengthy words.
3. Check for words which separated when new line is encountered. For example kernel
trap. Where kernel is at end of a line and trap follows in next line.4. Considering words like "isn't", "tour's" as single wordsIn K&R, we have a separate exercise for this. Exercise 1.14 states "How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?"
While writing this program earlier, we never handled much cases. There are caveats which needs to be identified and addressed.
First lets list what all possible checks we usually need while word counting.
1. Checking for very short words.
2. Check for very lengthy words.
3. Check for words which separated when new line is encountered. For example kernel
5. Check overall files size for size less than 2GB.
6. Check for mistyped words like "kernel - trap" which contain spaces in middle or an - instead of space ex.kernel-trap.
7. Check of non ASCII characters
8. Check for different encoding
Please shed your thoughts if I have missing any checks.