Edgar Cervantes / Android Authority
The past few months have been a whirlwind in the tech world. One minute we’re less than impressed by Dall-E’s low-resolution AI-generated images, the next we’re somehow chatting with Bing, our new favorite search engine. I can’t go on with the problem. Every day there’s a new Twitter thread about a ground-breaking AI tool, a new way to use ChatGPT or Midjourney, or a new capability built on the ChatGPT API. And somehow we are now on ChatGPT 4? But through all of this, one thought comes back to me, most of the time, I don’t need AI when I’m looking at a screen. Instead, I’d rather have this ChatGPT-like speech capability as a voice assistant in my Nest smart speakers.
There are two reasons for this. One, Google Assistant is always slow to understand and answer any slightly complex question and seems to be getting tired by the minute. Two, conversational AI makes more sense with a voice interface than a screen. Let me explain.
Are you happy with the current state of smart speakers?
345 votes
Google Assistant feels a bit outdated today, like Alexa and Siri.
Robert Triggs / Android Authority
Over the years, Google Assistant’s strength has been its ability to understand and execute voice commands issued in natural language. “Who wrote Pride and Prejudice” or “Who is the author of Pride and Prejudice?” or “Who is the author behind Pride and Prejudice?” Ask, and in all three cases he answers Jane Austen. You can try dozens of other ways to phrase that question and still get it right.
This makes Google Assistant a useful tool for setting reminders and timers, adding meetings, asking general knowledge questions, playing specific songs, and controlling the smart home. You don’t need to remember the exact command to turn off the light, you can just say it if it comes naturally to you.
The assistant is good at carrying out the commands it has learned. Answering open-ended questions, however, is his greatest weakness.
But dig a little under the surface and all the cracks are visible. Instead of playing the original song you want, you might get an acoustic, a remix, or—heaven forbid—a cover. It can also give you advice on how to clean your kitchen instead of telling the smart vacuum to clean the kitchen as you intended.
But nothing is more rewarding than when you ask an open-ended question to help. You’ll hear him ramble on endlessly citing a particular site, which may or may not actually answer your question. Basically, it reads you the original Google search results with zero regard for context. It’s very wordy, often confusing, and often unable to dig down a few layers to find an answer. Let me show you three examples that illustrate this.
Assistant is very talkative, often confused, and often unable to find answers.
Example 1 – confusion; My husband and I were talking about our upcoming trip to Chechnya and how the train system is strong, making day trips and commuting easy. “I asked if it was easy to travel by train in the Czech Republic and he gave me directions to Chechnya from where I am now. It didn’t help to repeat “within” instead of “within”.
Example 2 – Unable to answer: I have been fighting with my Olympus camera settings. I came across the whole menu with no explanation; The options were LF, LN, MN and SN. So I asked my audio guy about it and he said he couldn’t compare the settings, then he asked me if I wanted to know the difference (uh, repeating my question?), I said yes, and he just stopped. There is no answer
Example 3 – word of mouth After my recent trip to Barcelona, I was thinking about Spain’s political system and asked Google if it had a parliament. The answer was a snippet of a website that started with the two houses and then told me that those count as a bicameral parliamentary system.
Now compare the responses from a traditional voice assistant above to what a large language model like ChatGPT can provide. Chatgpt understood my reasoning behind the same transit route in the Czech Republic, started by saying yes, gave me a quick answer, and then explained the benefits of the train system. Since he said more than I wanted, I limited his results to one sentence in the following questions. And he understands them both, explaining what the camera settings are and starting with “yes” to explain the state of the Spanish Parliament.
There is no command to limit Google’s answer to one sentence or to limit the chat time. Also, not all current voice assistants can integrate answers from multiple sources, which is one of the strengths of the ChatGPIT and alternative language models.
Conversational AI: On-screen with voice interaction
Adam Molina / Android Authority
There are thousands and thousands of potential uses for conversational AI like ChatGPT, but one of the most interesting uses I’ve found for my own use is its ability to synthesize answers from multiple sources while understanding the constraints of the question. You can make it talk less like I showed in the example above, ask it to explain complex concepts like your five-year-old, or give it any restrictions to tailor the search to exactly what you want.
This is why voice interaction with this type of AI makes more sense. When I have a screen in front of me, scrolling through multiple answers in a second, I quickly tell which ones are irrelevant, and choose to only expand on what I want to hear more about. When I use voice commands, I have no choice but to listen to the one answer that Google Assistant gives me, and as we noted earlier, this answer can sometimes be far from satisfactory.
When I look at a screen, I can go through multiple results in a second. When I use audio, I can only listen to one answer I get. So far, that answer is not very good.
Google can tell me exactly when Real Madrid’s next game is, how tall the French president is or Mac Mac Klong, but I dare not ask if I can make a cocktail with yogurt liqueur and amaretto. But there is no egg white, or if there is a direct train from Paris to Rome. I can imagine all the ways it will help or ruin those questions before I even try, forcing me to pull out my phone and start a long Google or Bing search session to answer them.
And that’s the thing. If all Google Assistant does for two minutes while reading a snippet from the first search result is a waste of time. I’d rather pull out my phone and search there. At least I can go through more than one result in a few seconds.
I don’t want to single out Google here. Current voice assistant implementations of Amazon’s Alexa and Apple’s Siri won’t save me any research time or force me to use them more than Google. And here’s where I stand with any voice assistant today: I only use it for some smart home controls and very basic searches and queries.
If I had an AI voice assistant that could curate content from multiple sources and give me short, satisfying answers, I’d use it over and over again.
But if I had an AI voice assistant like ChatGPT that consolidated content from multiple sources and gave me a concise and satisfying answer, I would turn to it again and again every time I asked something. If I have my phone out, looking at a screen, and I’m lost for half an hour, I’d rather do that and stay with what I’m doing.
ChatGPT isn’t perfect, but I need a voice assistant like it in my Nest speakers
Rita El Khoury / Android Authority
While I keep extolling the virtues of ChatGPT, I don’t want it in my Nest speaker or any other smart speaker as it stands. The training data is outdated, unless you limit the output to sentences (but again, I appreciate being able to do that), it doesn’t cite sources, the data is much better in English than in other languages, and it obviously can’t control my smart home or add events to my calendar, among other limitations. .
What I would like to see is the Google equivalent. Call it Google Bard or Assistant 2.0 if you want, but here’s how I picture my voice interaction with it:
- It should be able to handle the same requests (smart home, conversions, reminders, calendar, etc…) that the current version does.
- It should also provide an intelligent, natural-language AI that can curate content across multiple sources on the web and take into account any constraints or parameters that I may impose.
- For the sake of brevity and immediacy, answers should not mention the names of the sources out loud and should be limited to one sentence (unless otherwise). But I should be able to ask him for more details and longer explanations.
- And for accuracy and further learning, a notification should always be sent to my phone with the answer, sources used, and an option to do a full search and tap to learn more.
- I need to be able to moderate it and limit the use of certain sources to avoid content that I think is low quality or inaccurate.
This is the evolution of the voice assistant AI that I approve of and have started using. Only time will tell if Google will take things in this direction or choose another path.