Siri is many people’s new best friend. This voice recognition system has been a feature of iPhones since the iPhone 4s was introduced. Siri demonstrates a facility that puts most computer-resident voice recognition software to shame.
The Siri software seems eerily to comprehend users in a way that some human listeners cannot even manage. The addition of this feature transforms the computer functions of an iPhone into something almost universally accessible. How this feature has been (nearly) perfected is a mix of technology and human skill.
Siri consists of two separate technologies. There is the voice recognition part, and there is the voice-data-to-speech part. Both are awesomely complex. As quoted by the website of the Human Interface Laboratory at Washington University, voice recognition is “the technology by which sounds, words or phrases spoken by humans are converted into electrical signals” (Baumann).
Then, “these signals are transformed into coding patterns to which meaning has been assigned” (Baumann). To recognize words from among all the sounds that humans make when they produce speech, the voice recognition software must use “speech unit extraction and concatenation” (Baumann).
To accomplish this “Sentence and word units which maximize the weighted sum of linguistic likelihood, amount of information, confidence measure, and grammatical likelihood of concatenated units are extracted from the speech recognition results and concatenated for producing summaries” (Baumann). This translates to a complex amalgam of grammar, vocabulary, and lots of statistics.
The voice production part of the system is, if anything, even more magical-seeming. The voice that comes out in the form of Siri actually sounds human, and it is, fundamentally.
A live woman, and man, or perhaps more than one, recorded hundreds of spoken words and parts of words. She recorded them in different tones and rhythms to create the richest possible range of English speech. These sounds were then used as a sound bank from which to extract individual phonemes, or the smallest and shortest identifiable elements of speech (Anderson).
As described by Anderson, “Words and sentences are analyzed, catalogued, and tagged in a big database, a complicated job involving a team of dedicated linguists, as well as proprietary linguistic software” (Anderson). In the next step, a piece of software called a “text-to-speech engine” searches for sounds, in real time, to combine with other sounds (Anderson).
These are strung together, as beads are on a string, to create novel words and sentences like “727 South Warnock Street” or “ the Lakers won by 2 points” when users need this information.
The inclusion of Siri in an iPhone does more than make forgetting one’s grocery list a thing of the past. It adds safety when driving, and turns the hand-held device into a mobile business office. A user can generate all sorts of correspondence using the voice recognition function to dictate text.
Using the iPhone’s accessibility settings for ‘speaking text’, a user can listen to almost anything on the screen read aloud. Siri seems to have even inspired a full-length movie – Her. The lonely protagonist falls in love with a sweet-voiced operating system that sounds suspiciously like Siri (IMDB).
Admittedly, the artificial intelligence, and the relationship depicted are probably a few years away. However, many people are already very dependent on the way that Siri helps them connect to the internet, and to use their iPhones. Siri is also not the only voice in the marketplace.
More and more pieces of equipment (for example, RSS readers) and other venues now are using some form of voice generation or voice recognition. Anderson notes that “Google, Apple, Microsoft, and even Amazon have all invested heavily in speech” (Anderson). Voice recognition and generation, whether in the form of Siri or some other software, is a welcome and useful part of users’ lives now.
Works Cited
Anderson, Lessley. “Machine Language: How Siri Found Its Voice.” 2013. The Verge. Web.
Baumann, James. “Voice Recognition.” 2014. Washington University. Web.
IMDB. “Her.” 2014. Web.