
Writing a haiku-detecting bot for Slack
At Metal Toad, we have several bots integrated into Slack. Some are more useful (TicketBot, which detects mentions of JIRA tickets and provides links) and some are more whimsical (plusplus, which lets everyone give their coworkers points for whatever reason). I wanted to get in on this, so I decided to add to the latter category and write a bot that would detect when someone inadvertently wrote a haiku. Here's how I did it; maybe it will inspire you to write something too.
The first step was to find a way to convert messages into syllable counts. I didn't find any readily available data with the number of syllables in English words, but I did find the CMU Pronouncing Dictionary. This is intended for speech recognition and synthesis applications, and maps words to phonemes. For example:
QUIZZICAL K W IH1 Z AH0 K AH0 L
The numbers indicate the stresses on the syllables, so by counting the number of tokens that end with a digit, we can get the number of syllables in a word. I wrote a Python script to output a JSON file mapping words to syllable counts:
import codecs, json, string lines = [line.strip() for line in codecs.open('cmudict-0.7b.txt', 'r', 'iso-8859-1') if line[0] != ';'] syllables = {} digits = tuple(string.digits) for line in lines: tokens = line.split(' ') count = len([token for token in tokens[1:] if token.endswith(digits)]) syllables[tokens[0]] = count with codecs.open('syllables.json', 'w', 'utf-8') as output: json.dump(syllables, output, ensure_ascii=False, indent=0, sort_keys=True)
With the necessary data in place, I started with the bot itself. Our bots run inside Hubot, so I used its native language CoffeeScript (anything else that ends up as Javascript would have worked too). I needed to write a custom listener that would read every message in the channel and, if it matched the 5/7/5 syllable format of a haiku, output a message celebrating the accidental artistry of the author. Hubot's robot.listen works as follows:
module.exports = (robot) -> robot.listen() (message) -> is_haiku message (response) -> response.send ":leaves: Haiku detected! :fallen_leaf:" )
If the function that is passed message
returns true, the second function is called, which sends a message to the Slack channel. That's it! Except for writing is_haiku
, of course. Here's how I did that:
is_haiku = (message) -> if not message.text return false words = message.text.split ' ' start = 0 for line in [5, 7, 5] result = starts_with words[start..], line if result == false return false start += result start == words.length
We split the message text into an array of words, then see if the array starts with words totalling five, then seven, then five syllables. If we have consumed all of the words after that, then the message matches the haiku pattern, and we return true
. starts_with
is the part that actually uses the syllables data:
starts_with = (words, count) -> consumed = 0 re = /\W*\b(.+)\b\W*/ for word in words # replace smart quotes/dashes with plain ones word = word.toUpperCase() .replace(/[\u2018\u2019]/g, "'") .replace(/[\u201C\u201D]/g, '"') .replace(/\u2014/g, '-') matches = word.match(re) if matches == null # no word characters, skip this word consumed += 1 continue word = matches[1] if word of custom_words count -= custom_words[word] else if word of syllables count -= syllables[word] else # unknown word return false consumed += 1 if count == 0 return consumed if count < 0 return false return false
This works by starting at the first word, sanitising it so that smart quotes won't prevent us from finding the word in our syllables data, seeing if the word has letters in it and skipping it if not, then seeing if we know how many syllables the word has. If it's an unknown word, we don't know how many syllables it has, and we have to return false, denying that the message is a haiku. Otherwise, we increase the count of syllables we've seen so far. This subtotal can be: less than our target number of syllables (five or seven), in which case we do the same thing with the next word; more than our target, meaning we've overshot our target and the message is not a haiku; or equal to the target, meaning we have encountered just the right number of words for this line of the haiku. This count is returned so that the next call to starts_with
can be passed new words instead of starting at the beginning of the message again.
With this, haikubot was ready to detect gems like the following from Aaron Amstutz:
I used to be punk
until I broke my skateboard
and got a haircut.
This is all well and good, but many messages that might be a haiku would be ignored if they contain someone's name or some other word that is not in the dictionary. I wanted to add the capability to learn new words to haikubot. To do this, I used robot.hear
instead of robot.listen
. This allows for a regular expression to be used instead of having to write a function:
robot.hear /haikubot learn (\S+) (\d+)/i, (response) -> count = parseInt(response.match[2]) if count > 0 custom_words[response.match[1].toUpperCase()] = count persist_custom_words() response.send "Thanks for teaching me!" return
The groups matched in the regex are available through the array response.match
. Any message that starts with haikubot learn
followed by a word then one or more digits is handled by adding the specified word and syllable count to a variable custom_words
. You might've noticed above that this variable is used alongside syllables
in starts_with
. persist_custom_words
saves the custom words so that they are preserved if Hubot needs to be restarted.
I added a few other robot.hear
commands: forget
, to remove custom words that were erroneously added; list
, to display all the custom words that haikubot knows; and help
, to tell users what commands are available. The full code is available here. There are a few other features I'd like to add some time: posting all results to a #haiku channel; highlighting results that seem especially good (perhaps those with punctuation between lines, or 'terminal' seeming words at the end of lines); and making detection more robust. Hope you have fun with it!
Comments
Hahahahaha
Wed, 08/01/2018 - 10:10
Awesome
Tue, 11/22/2016 - 23:43