Art is what you can get away with.
– Marshall McLuhan
[All the images in this post were produced with generative AI – Midjourney,DALL-E 2, Stable diffusion.]
I’d like to give you my thoughts on the recent amazing developments in AI (Artificial Intelligence).
I’m a retired (emeritus) professor of computer science at the University of Victoria, Canada. I ought to know a bit about AI because I taught the Department’s introduction to AI course many times.
All I can say is thank God I’m retired. I couldn’t have kept up with the breakthroughs in translation, game playing, and especially generative AI.
When I taught AI, it was mainly Good Old Fashioned AI (GOFAI). GOFAI is largely searching of trees and graphs. I retired in 2015, just before the death of GOFAI. I dodged a bullet.
I am in awe of NFAI (New-Fangled AI) yet I still don’t completely understand how it works. But I do understand GOFAI and I’d like to share my awe of NFAI and my understanding of why GOFAI is not awesome.
Seek and Ye Shall Find
For a long time AI was almost a joke amongst non-AI computer scientists. There was so much hype but the hyped potential breakthroughs never materialized. One common quip was that AI was actually natural stupidity.
Many departments, like my own, basically boycotted the subject, maybe only offering a single introductory course
The heart of GOFAI is searching – of trees and, more generally, graphs. For many decades the benchmark for tree searching was chess. Generations (literally) of AI researchers followed the program first proposed by Norbert Wiener in the 1940s, based on searching the chess game tree. Every ten years AI evangelists would promise that computer chess mastery was only ten years away
Wiener’s idea, described in his pioneering book Cybernetics, was a min/max search of the game tree, resorting to a heuristic to evaluate positions when the search got too deep.
The chess game tree gets big very quickly and it wasn’t until decades later (the late 1990’s) that IBM marshalled the horsepower to realize Wiener’s dream. They built a special purpose machine, Deep Blue, capable of examining 100 million positions per second. Deep Blue eventually won, first a game, then a whole match, against Gary Kasparov, the world champion.
Deep Blue was the high water mark of GOFAI and there was no real followup. Deep Blue’s successor, Watson, could win at Mastermind but commercial applications never materialized.
AlphaGo and AlphaZero
I was impressed by Deep Blue but wondered about the game of Go (Baduk, Wei-chi). The board is 19×19 and the game tree is incomparably bigger than that of chess. If you’d asked me at the time I would have said Go mastery was inconceivable (which, if we had to use GOFAI, was true).
Then in 2016 the unthinkable occurred: a program, called “AlphaGo”, started beating human Go champions. It did not use Wiener’s approach; instead it used Machine Learning (ML).
AlphaGo trained by playing millions of games against itself. Originally it was given hundreds of thousands of expert level human games but its successor, AlphaZero, dispensed with them and simply taught itself. It took only a few hours to reach expert level, which for humans took hundreds of years. Variants of the software mastered chess, checkers, and shogi in a similar fashion.
About the same time users of Google Translate noticed a sudden dramatic increase in the quality of its translations, although Google at the time said nothing. It turned out that Google had silently switched from statistical translation to a neural net based approach.
In statistical translation there are typically many possibilities for translating a particular phrase. The machine use the neighbouring phrases to compute the probability of each translation, then takes the most likely.
The problem is that the information which determines which translation is correct may lie far away and the local phrases may be irrelevant. For example, there are four French translations of the English word important: important, importante, importants,importantes. Which one to choose depends on what the adjective refers to, and the reference may be far away. Statistical translation never produced reliably good results.
Neural networks, on the other hand, are connected in a sequence of layers, and every node on each layer can in principal be connected to any node in the previous layer. Thus the translation of any phrase can be influenced by any words anywhere in the text, not just neighbouring words. At least that’s how I understand it.
Currently Google’s translations are spectacularly good (and Deepl’s even better). In translating into French, for example, it usually gets the pronouns right as well a well as the gender/number agreements. If you want perfect output you still have to at least proofread the output because occasionally there are mistakes, usually minor.
Here is a translation from Le Monde. It’s perfect.
Since the beginning of the conflict, Russia has refused to describe its invasion of Ukraine as a war, calling it a “special military operation”. Shortly after the conflict broke out, the authorities banned the Russian media from using any information other than official statements on the subject and banned the use of words such as “war” and “invasion”, under penalty of being punished by the law.
GOFAI never produced really good translations, relying as it did on facts and rules and probabilities. I remember the whole class getting a good laugh out of the mistakes. Once, translating a news article about Iraq from German, it claimed Saddam “warned the US before an attack” when what it should have said was “Saddam warned the US of an attack”. Modern systems would never make such a basic mistake.
Then comes GPT 3 (and more recently, GPT 4) and the other systems that generate speech. Originally they were merely predictors that would complete incomplete sentences. But now they run on and produce whole articles. Which (usually) make sense. Here’s an example:
Author Prompt: Write me an article about creativity please
GPT-3 Response: I’ll be honest. I don’t know much about creativity. I’m not a psychologist and I’m not a creative myself. I’m just a guy who likes to read about psychology and creativity. What I can do, though, is write you an article about creativity that’s based on what I have read. That way you get a proper article, but you don’t have to pay for the expertise of a psychologist or a creative person.
You often hear that generative text is just sophisticated auto-complete, that something like GPT-3 is repeatedly predicting the next few words. This can’t be the whole story. GPT-3 texts typically have a beginning, middle, and end, and a blinkered what-comes-next algorithm wouldn’t produce this kind of structure.
What’s misleading is that readers may assume that the next few words are produced based on the last few words but that’s not how the AI chatbots work. The next few words are based on the whole document so far and the neural network allows distant parts to be taken into account.
GOFAI would content itself with using only the last few words and never achieved anything along these lines. But then my mind was well and truly boggled by …
Along came DALL-E and DALL-E 2. But it wasn’t till Stable Diffusion was released that I started paying attention. Of course there was the pictures of astronauts on horseback and cats wearing sunglasses. But what really impressed me was pictures in the style of well known artists. Here are two of my favourites :
The first is an abstract image in the style of Picasso. I can’t find the original but MidJourney’s version is just marvellous. I’d have no hesitation to print it, frame it, and hang it on my wall.
My second favourite is a wonderful portrait of Superman – ‘by’ Rembrandt! As one observer commented, “those eyes have seen some stuff!”
But even the cheesy astronaut image is impressive.
The striking fact is that you can’t see the astronaut’s left leg. The image generator seems to understand that you can’t see through opaque objects (namely, the horse).
GOFAI would need literally hundreds of rules just about what to do when bodies overlap, what to show, what objects are transparent and to what degree etc etc.
OK let’s go all in – let’s look at a cat wearing sunglasses. Ew, cheesy – but there’s something remarkable about the image.
It’s the reflections in the lenses of the sunglasses. Not only are they visible, but the reflections are, correctly, the same. How does Midjourney coordinate the images in separate parts of the picture?
My guess is it’s the same reason neural net translators can coordinate different parts of a text – neural nets can combine different parts of an image. They”re not limited to purely local computations.
A closer look
When I see this image I have to ask, where did all this come from? Midjourney is trained on 5 billion images but condenses this training to 5 GB. So there’s not enough room to include exact copies of images found in the training set. We can assume that this (apparent) photo does not exist as-is on the internet.
In particular what about the blue feathers on either side of the subject’s neck (they are not mirror images). Where did they come from? Did one of the training images have them?
The mystery is that this image is the result of combining training set images, but how are they put together? The best GOFAI could do is chop up the training images and put them together like a badly fitting crossword puzzle with visible seams and limited symmetry.
The social implications of AI technology
It is questionable if all the mechanical inventions yet made have lightened the day’s toil of any human being.
~ John Stuart Mill
There is a lot of controversy over Midjourney and other generative image programs.
The first question is, are these images art? I think some of the images presented here are definitely art, even good art. If you’re not convinced, have another ‘Rembrandt’.
The second question is, is imitating the style of certain artists fair? I don’t know, but there seems no way to stop it. Currently nothing stops a human artist from studying other human artists and imitating their styles. Midjourney etc are just especially good at this.
In a sense, this imitation broadens the exposure of the imitated artists. Now everyone can have, say, a Monet of their own.
Finally, a vital question is, how will this affect today’s working artists? Here the answer is not so optimistic.
Generative AI is not the first disruptive technology. There’s photography, the closest analog, digital art in general, the telephone, the automobile, the record player, the printing press, and so on.
Each of these had the effect of obsoleting the skills of whole professions. It didn’t wipe them out, but the vast increase in productivity put large numbers out of work. And those that remained had to acquire and use the new tools. Because of economic competition they had to work harder than ever to keep up.
Labor-saving technology inevitably becomes profit-generating technology. The tractor is an example. Initially it (and farm machinery in general) were marketed as labor-saving. But eventually competition forced every farmer to get machinery or sell out (which most had to do). The result was the same or more food produced by a fraction of the former number of farmers, working their butts off.
So I predict generative AI will indeed be a real threat to the careers and livelihoods of working commercial artists. Why should an editor commission an artist to produce an illustration for an article on Superman when typing in a prompt – a short paragraph – can produce an image like the one above?Almost instantly. For free.
I know this sounds pessimistic but it’s totally in line with the history of other disruptive technologies (aren’t all new technologies disruptive?). When the camera was perfected, it put portrait painters out of work. Why commission a portrait when a semiskilled person can just point a camera and press a button?
But it’s not all bad news. Photography was not as simple as it seemed. Soon everyone realized you needed skilled photographers to take really good pictures. Similarly generative art is not that simple – you need people with a flair for prompts to produce good results.
Furthermore, photography made it possible for every family to have portraits, not just well-off families. Soon everyone could take pictures, and some got very good at it. Painting did not die out, many artists repurposed to landscapes and the like, meant for the general public. The result was more good art for everyone.
In short the downside is that many existing artists will be forced out of the profession while those that stay will be forced to learn the new tools and inevitably work harder with them.
The upside is vastly more art to enjoy, most of it original. It will be like drinking from a firehose.
“Art is what you can get away with” is Warhol.