One of the biggest challenges would be to ensure accuracy and reliability
in speech-to-text recognition of addresses. For users with Singaporean accent
and especially for non-English speaking pioneers, getting the right addresses
from voice commands would be difficult. This was made worse with complex and native
addresses. My workaround was to prompt for postal addresses, then improve the
reliability of reading floor, unit numbers and postal code from voice commands.
The results were localised through a dictionary of corrections, generated from a
series of tests conducted by locals.
This agent was built before the availability of Google Duplex. As such, one-click
voice call integrations were unavailable, at least not directly. I integrated the
agent with Twilio Programmable Voice service to receive and parse commands through
phone calls and ensured that the Dialogflow agent webhook was generic enough to
support services other than Google Assistant, such as Telegram bot and Facebook
Messenger.
Another issue I had encountered would be the cold-start timings when serving the
webhook off Firebase Functions. Due to poor cold-start timings on Firebase Functions,
connections with the agent tend to time-out and break off midway, especially when
traffic to the webhook was low or sporadic. Switching over to Google App Engine
improved the experience tremendously.