Google swings for the fences with PaLM 2 and Gemini AI systems

Google swings for the fences with PaLM 2 and Gemini AI systems
Google used its I/O conference to launch a new LLM that surpasses GPT-4 on certain metrics – as well as to tease something far more powerful
Google used its I/O conference to launch a new LLM that surpasses GPT-4 on certain metrics – as well as to tease something far more powerful
View 1 Image
Google used its I/O conference to launch a new LLM that surpasses GPT-4 on certain metrics – as well as to tease something far more powerful
Google used its I/O conference to launch a new LLM that surpasses GPT-4 on certain metrics – as well as to tease something far more powerful

Badly bloodied by OpenAI's GPT-4, Google has struck back with a new, more powerful large language model (LLM) to upgrade Bard and create a suite of new AI services – starting with a model targeted at doctors. It also teased its next-gen Gemini AI.

Launched at the company's I/O '23 conference, the PaLM 2 model leapfrogs its predecessor on pretty much every metric, according to a technical report, but Google chose to highlight three areas in which it believes the new model's particularly strong.

The first is multilingual capabilities. PaLM 2's training data was loaded up with a greater percentage of non-English text, and can now pass a bunch of different language exams at a "mastery" level. It's now outperforming Google's own Translate engine, and displaying a subtle understanding of languages, idioms, metaphors and the cultures behind them.

The second is "reasoning" – there's been a keen focus on maths and scientific papers in the training data, and Google says it's displaying "improved capabilities in logic, common sense reasoning, and mathematics." Maths in particular is an area where LLMs as a whole have struggled; it's just not their forte – and indeed, while PaLM 2 does beat GPT-4 on selected benchmarks, the gains here appear incremental rather than revolutionary.

The third is in coding, an area of immense potential for these LLMs. Google claims PaLM 2 is super-capable with Python and Javascript, but also very strong in a range of more specialized programming languages.

Introducing PaLM 2, Google’s next generation large language model | Research Bytes

PaLM 2 is already rolled out as part of the company's embattled Bard search AI. It's also now coming to Workspace, including Gmail and Google Docs, in the form of a collaborative "Duet AI" that can generate images and words for your projects, help you brainstorm, organize spreadsheets, analyze and label data, and do a bunch of other little things designed to get tasks to the finish line.

But perhaps more interestingly, Google is leaping into the arena of industry-specific AI models, starting out with one targeted specifically at doctors.

Med-PaLM 2 has had most of its alignment and human feedback done by groups of health researchers and medical professionals. As a result, it becomes the first AI to achieve an "expert" level on tests designed to mimic US medical licensing exams. It can answer all sorts of health-related questions and is well-read across a variety of medical literature.

Like GPT-4, PaLM 2 is also beginning to gain multimodal capabilities – the ability to understand images and other media in the same way it "understands" text. In the context of Med-PaLM 2, this means it'll soon be able to look at your X-rays and other medical scans, and report on them – an area in which AIs have excelled in early trials, sometimes outperforming medical specialists.

Med-PaLM 2, our expert-level medical LLM | Research Bytes

Google will open this tool up to a small group of users in the coming months, aiming to "identify safe, helpful use cases" that could see Med-PaLM 2 rolling out into doctors' offices. This is both an exciting and daunting prospect; it promises a leap forward in healthcare and could put incredible tools in the hands of medical professionals.

At the same time, it's hard to ignore the fact that many of humanity's best and brightest students have gone into the medical field. ChatGPT is already looking much, much better at answering medical questions than human doctors – as judged by healthcare professionals themselves, complete with superior bedside manner and empathy – and when these machines inevitably expand their abilities to outperform doctors across the board, it'll be yet another helping of humble pie for a species that considers itself pretty special.

Google also used the opportunity to announce a restructuring effort it hopes will "significantly accelerate" the development of next-generation AIs, merging the Google Research Brain team with DeepMind to form Google DeepMind.

And with this, the company revealed what sounds like an absolute beast of an AI in development: "we’re already at work on Gemini — our next model created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning. Gemini is still in training, but it’s already exhibiting multimodal capabilities never before seen in prior models. Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2, to ensure it can be deployed across different products, applications, and devices for everyone’s benefit."

Training Gemini from day one on audio, video, images and other media – as well as text, and the ability to use other tools and APIs – means this thing is designed to learn even more like humans do than the big LLMs of today, and its ability to interact with the outside world in a range of ways beyond just a text window is baked in rather than tacked on. It could well prove as much of a leap forward as anything else we've seen in the last six months – a sobering thought in itself.

On paper, today's announcements appear to show solid progress from Google, bringing it close to where OpenAI's been at for a few months now with GPT-4. The stockmarket certainly seemed satisfied, bumping Alphabet stock up more than 4% – but it'll be interesting to see how PaLM 2 performs in the harsh light of the real world over the coming weeks.

You can see the entire Google I/O keynote presentation in the video below.

Google Keynote (Google I/O ‘23)

Source: Google AI

I’ve been told for 25 years that one computer or another is “going to be able to replace Doctors”. That isn’t the case. These language models are not “artificial intelligences”, but rather statistical tricks that cobble together an output most likely to be approved of by the estimated requestor. There is absolutely nothing the least bit intelligent about them, and only a fool would depend on them, as the possibility of them being “wrong” Carrie’s too much potential for harm in the medical world.

The one truth I have learned in 25 years of being on the edge of technology in medical practice: that the only people that should never be able to touch medical software are software engineers…..they never get it right for many reasons.
There is a Wolfram plug in for ChatGPT that enables it to do advanced calculation.
@Drjohnf , You seem to think that human doctors are infallible. Though they are statistical based models ,they have shown to be more accurate than the average doctor.
Joy Parr
@Drjohnf: You're very dismissive. I think you're wrong. Time will tell. You won't have long to wait.
@Drjohnf While I definitely seen the limitations of ChatGPT, I've also seen the limitations of human doctors. So far everything I received from a doctor in the last 10 years, has be from them looking for the statistically most likely result as provided by an expert system, or more commonly now, via googling. If AI can do the same thing but 1000x faster, and never need to sleep or eat or take breaks, I think that would be huge win.

It would hopefully free doctors up from basic diagnoses of illness, and give them time to do research, or help deal with conditions that didn't fit the most common statistical case.

I do think the risk is high though, current AI, because if its increasing similarity to humans, seems to make the same kind of errors that people do. e.g. if you test ChatGPT on basic math (pre-wolfram alpha plugin) it gets sloppy when doing more than 7 digit addition. That said, its easier to fix one AI and make it more rigorous in the future, than it is to fix every single future med student, and it takes a lot less time to train an AI.

As far as being cost affective, I think AI has a pretty steep initial cost, but scales very easily. So 1000 doctors is probably a lot more cost effective than 1 doctor AI, but 1 doctor AI might be cheaper than 100,000 doctors. And if its cheaper, that theoretically frees up resources to develop more cures for more diseases, or more treatments for more patients.
Legal liability alone means doctors will not be completely "replaced" any time soon in the same way being able to Google/search something didn't replace professionals but AI will change many jobs in much the same way as the Internet did.