Machine Learning and Audio Narration
Jan. 28th, 2023 02:11 pmI wrote a hypothetical about the future of machine learning and creative works a few months ago. I am still thinking about this subject. It's hard not to.
I received an invitation this morning to submit one of my books to Apple's new digital narration program.
My choice here, in one sense, is easy. These are the actual categories Apple is accepting:
Primary category must be romance or fiction (literary, historical, and women’s fiction are eligible; mysteries and thrillers, and science fiction and fantasy are not currently supported).
I do have several titles where "the primary category is romance". However, all but one of them is also fantasy. My sole contemporary romance is You Thought You Wanted to Be Level 99 but Really You Wanted To Be a Better Person. And this title is full of sections like this:
From afar, Jadarea says to Razgathak and you, “sorry”
From afar, Jadarea says to Razgathak and you, “couldn’t think what else to do”
From afar, you say to Razgathak and Jadarea, “It was a selfless and generous act on your part, Jadarea; by no means should you apologize.”
From afar, Razgathak says to Jadarea and you, “Yeah”
From afar, Razgathak says to Jadarea and you, “What Cae said. ty”
From afar, Razgathak says to Jadarea and you, “That is. Thank you. Seriously, that was amazing. We should all be dead.”
From afar, you say to Razgathak and Jadarea, “Just so. Allow me a few moments to relog; I shall message as soon as I reconnect.”
From afar, Razgathak says to Jadarea and you, “wait”
From afar, Razgathak says to Jadarea and you, “Loot Cae’s corpse first, just in case.”
From afar, you say to Razgathak and Jadarea, “Oh yes. That would be wise.”
In text, it's a little awkward if you are not familiar with the conventions of MMOs and/or MU*s, but a reader is likely to catch on and skim the dialogue tags rather than reading them. But in audio form, it would be tedious in the extreme. Beyond that, it has many fictitious names, like Razgathak and Jadarea, that a digital narrator will not recognize or know how to pronounce. (This might be one reason sff is excluded from their initial plans, though that is pure speculation on my part.)
Even if I submitted a title, it's unlikely it would be accepted. If it was accepted, it would be unlikely to garner many sales. I will not submit a title at this time, and I give up very little in making this choice.
I listened to the audio samples for romance titles. Compared to the voice that gives directions on my GPS or that reads text messages aloud in my car, they're fantastic. They do not sound mechanical and they vary their emphasis on parts of a sentence in a generally sensible way. But compared to a human narrator: meh. They are clearly inferior to a skilled human narrator. In some ways, they're inferior even to, say, me reading aloud, and I have zero skill at voice narration.
At least one of my readers is blind. She reads my books via screenreader. I think about her when I think about digital narration, more than anyone else. The machine-learning image and text generators make it easy for humans who want to turn their ideas into images or text without putting in the time to draw/write it themselves. They are a benefit to the human creator who wants such a tool. The benefit to human audiences -- people who want to view art or read stories -- is much less clear. As an audience member, I find ML text to have negative value -- searches are more likely now to turn up machines spouting confident and unidentified lies. ML images are less annoying, but I still am rarely glad to have seen them, or feel that they have enriched my experience.
Digital narration, by contrast, has a clear case for improving accessibility to audiences who literally cannot read stories in written form.
Beyond that, if a book is written by a human and narrated by a computer, it is absolutely clear that the human author also owns the copyright to the resulting audio. There is no legal question that the human who wrote the words has done enough creative work to be entitled to copyright protection. By contrast, the legal question of "can you own the copyright on an image generated by a computer based on your prompt?" is not yet settled, but the Library of Congress is at present refusing to grant copyrights to such work. (For the curious, Legal Eagle did a video on AI and the law, with an emphasis on image generators and related lawsuits.)
It's possible that digital narration is subject to some of the same legal issues that afflict machine-learning image and text generation. I don't know how Apple trained their digital narrators or if human narrators would have grounds to claim infringement. It seems less likely; Apple is not using their digital narrators to cobble together the content of books based on a training set. At most, they are copying the style of human narrators, and 'style' (as Legal Eagle notes) is not something you can copyright.
That doesn't mean there aren't ethical questions here. I am a Luddite about ML image and text generation. I hate it. I hate the idea of it. Writing stories and illustrating are my hobbies. Computers can have my day job and may the world take much joy from the result, but I don't want a computer coming for my hobbies.
But I am very aware that, as a person who neither does voice narration nor listens to audiobooks, I have no such personal bias when it comes to digital narration. Audio narration is also an art form, if not as popular a hobby as writing or drawing. Is there a meaningful difference between asking a computer to read my book aloud so I don't have to, and asking a computer to paint a picture for me so I don't have to?
So my reasons for self-selecting out of the Apple offer are:
- I have only one title that suits their criteria
- That one title is a terrible candidate for other reasons
- I am not familiar enough with the tech behind digital narration to know the legal and ethical issues, if any
- I feel some solidarity with voice actors who hate the idea of machine narration
If Apple had said "we're looking for all romance, including fantasy", would my other two reasons be enough? I don't know. If I didn't have the last two reasons, I'd throw A Rational Arrangement or Level 99 at them anyway; no cost to trying.
None of my books have audio editions. I have zero objections to audiobooks, but paying for professional narration of any of my titles would cost more than I have earned from any one title. The odds of making back such an investment are minuscule. And I am not interested in acquiring the skills and equipment to edit my own audio.
That last gets to what bothers me most about all of this. Half the work of audio narration is editing. I know multiple people who would be happy to record their own audio narration -- if only they didn't have to edit out all the pops and wheezes and mouth noises and weird pauses and whatnot. If I didn't already know how tedious and/or expensive it was to edit audio, I might be willing to invest the time in learning to narrate and record my own audio. Why are developers insistent on replacing the part that humans excel at and in some cases enjoy -- reading aloud -- instead of automating the part humans generally hate -- getting rid of extraneous noises? You don't hear people complaining that ProWritingAid will put proofreaders out of work because (a) it won't but also (b) hardly anyone likes doing proofreading anyway. Luddites are rare in modern times because most modern people are used to the idea that if their job gets automated, they'll find new work in another area, and very few of them loved the old work. Much of my professional career has centered on "I hate doing this, let me see if I can make the computer do it for me." I never worried that I would run out of things I hated doing to automate.
Tl;dr: dear innovators, please direct your efforts at having machines perform chores and not leisure activities k thx.