Grep
#!/bin/bash
# dictionary_matches $dictionary $text
function json_dict_words {
arr=()
while IFS= read -r line; do
arr+=("$line")
done <<< "$(grep -Fwio -f "${HOME}/share/dict/${1}" <<< "$2" | sort -u)"
if [[ -n ${arr[*]} ]]; then
jq -cn '$ARGS.positional' --args "${arr[@]}"
fi
}
ExampleGroup 'Create a JSON array of phrases matching those in a dictionary'
Example 'find a word in the text'
When call json_dict_words 'actors' 'Appel @ Padstal'
The output should equal '["Appel"]'
The status should be success
The error should be blank
End
Example 'find a phrase in the text'
When call json_dict_words 'actors' 'n Aand saam Jakkie Louw (ten bate van SAACA) @ Die Centurion Teater'
The output should equal '["Jakkie Louw"]'
The status should be success
The error should be blank
End
Example 'find two phrases in the text'
When call json_dict_words 'actors' 'Emile Alexander & Dakes LIVE in Johannesburg at Gatzbys LIVE, Midrand - 10 February 2024'
The output should equal '["Dakes","Emile Alexander"]'
The status should be success
The error should be blank
End
Example 'no matches'
When call json_dict_words 'actors' 'Nothing in dictionary'
The output should equal ''
The status should be success
The error should be blank
End
End
As is common, the above uses IFS= read -r
and courtesy of the SC2162 I discovered IFS=
clears the internal field separator prevents stripping leand and trailing whitepace, which I actuall usually want.
Reversing this process by converting comma seperated strings back to “line-oriented” strings is covered in Indexed Arrays.