Friday, October 25, 2024

Adventures in AI

 Everyone's talking about AI these days, with half the web, it seems, taking advantage of various developers making their AI models and applications publicly accessible, to create all manner of imagery, video and text.  So, I thought that I'd join in and start conducting some AI experiments of my own, but in audio.  Actually, I was spurred into action after reading an online article about Google's NotebookLM application, which is currently free to use, while it is in development.  This is actually intended as a tool for analysing and summarising sources input to it and outputting its conclusions in several formats, such as bullet point reports, digests, etc.  You can input all manner of sources, including text documents, video files and audio files, in multiple if you choose.  As well as its written reports, it can also create an audio analysis of the sources.  But not in a straight narrative form, instead presenting it as a conversation between two AI characters, who sound disarmingly real, as does their conversation.  If I didn't know that this audio 'Deep Dive' was AI created, I'd swear that these were genuine human beings chatting.  

Naturally, I decided to test its capabilities by feeding it a story from The Sleaze, specifically 'Spanking the Monkey', which had been inspired by a BBC article about a monkey torture ring that I'd read.  The results were quite fascinating.  There is something almost surreal in hearing a satirical story, made up of invented facts and featuring completely made up characters, being discussed and dissected as if it were a real news story.  The AI seemed to have grasped the main thrust of the story, but its analysis approached it from some fairly obtuse angles, presumably because these better suited its presentation in the format of a discussion.  There were, however, some points it appeared to have become confused over, getting the name of one character wrong and subsequently skipping over the entire point of that character's sub-plot.  But at least it didn't appear to have fabricated anything in order to plug gaps created by a failure to grasp some aspects of the story, (a common complaint levelled at many of the current AI models).

My next experiment involved me inputting a post from this blog - 'Modern Movies are still Rubbish (Well, Some of Them)' - to see what the AI made of something written from a personal perspective, expressing personal opinions on a fairly esoteric subject.  Again, the results were fascinating.  The 'Deep Dive' indicated that the more personal, opinion-based nature of the piece had been recognised and it was approached as such.  It also exhibited a background knowledge of B-movies beyond the article, knowing what they were.  But it stumbled over the chronology of the films discussed and at one point, actually made something up, when it stated that David L Hewitt had paid off the lab bill for his film The Lucifer Complex!  Quite where this came from, I don't know.  I've been over the source post several times and remain mystified - the post makes clear that he lost control of the movie and somebody else bought the film from the lab and released it, (with additional footage that tried to bridge the gaps resulting from missing footage that Hewitt hadn't filmed).  The audio ends with one of the AI characters even inviting replies via social media!  

I've also been experimenting with AI voices.  Text-to-Speech (TTS) apps have been around for a long time, but the last time I used it, the voices were all somewhat metallic-sounding electronic voices, but nowadays they use far more realistic AI created voices.  Moreover, there are now TTS apps which offer a range of celebrity voices.  Now, these are usually paid services, but I found one that gives the opportunity to test voices by creating twelve second clips of text read by them.  Now, the results aren't downloadable, but, you can, of course, use Audacity (if you have it installed - other audio editing apps are available), to record any audio played on your laptop.  So, I created a series of twelve second clips, using a facsimile of Donald Trump's voice, which were actually lines from a script I'd written, recorded them, then edited them together on Audacity, to create a series of fake 'sound bites' of Trump.  While it is still obvious that these aren't real, the voice comes close to being convincing.  Getting the rhythm of Trump's speech was the biggest challenge, achieved via the punctuation of the script and some judicious editing.  Also, as with most TTS created speech, it benefited from slowing down the tempo slightly.  I also added some reverb and background crowd noises to give the impression that these are audio clips taken from a Trump (or 'Trumper', as, for legal reasons, his AI counterpart is referred to in the clips) rally.

So, what was the point of all this activity?  Well, it is my plan to produce a complete podcast using AI, probably centered around some of those NotebookLM 'Deep Dives', with an AI TTS created narrator providing a framework.  In the meantime, the B-movie 'Deep Dive' (complete with the condiment business) forms part of an upcoming podcast,which also features the 'Trump' clips, to be published soon on the Onsug.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home