Frontier Software

Grep

manual

wikibooks

#!/bin/bash

# dictionary_matches $dictionary $text
function json_dict_words {
  arr=()
  while IFS= read -r line; do
    arr+=("$line")
  done <<< "$(grep -Fwio -f "${HOME}/share/dict/${1}" <<< "$2" | sort -u)"
  if [[ -n ${arr[*]} ]]; then
    jq -cn '$ARGS.positional' --args "${arr[@]}"
  fi
}

ExampleGroup 'Create a JSON array of phrases matching those in a dictionary'

  Example 'find a word in the text'
    When call json_dict_words 'actors' 'Appel @ Padstal'
    The output should equal '["Appel"]'
    The status should be success
    The error should be blank
  End

  Example 'find a phrase in the text'
    When call json_dict_words 'actors' 'n Aand saam Jakkie Louw (ten bate van SAACA) @ Die Centurion Teater'
    The output should equal '["Jakkie Louw"]'
    The status should be success
    The error should be blank
  End

  Example 'find two phrases in the text'
    When call json_dict_words 'actors' 'Emile Alexander & Dakes LIVE in Johannesburg at Gatzbys LIVE, Midrand - 10 February 2024'
    The output should equal '["Dakes","Emile Alexander"]'
    The status should be success
    The error should be blank
  End

  Example 'no matches'
    When call json_dict_words 'actors' 'Nothing in dictionary'
    The output should equal ''
    The status should be success
    The error should be blank
  End

End

As is common, the above uses IFS= read -r and courtesy of the SC2162 I discovered IFS= clears the internal field separator prevents stripping leand and trailing whitepace, which I actuall usually want.

Reversing this process by converting comma seperated strings back to “line-oriented” strings is covered in Indexed Arrays.