Last Week in AI (03.18.24 – 03.22.24)

Welcome to Last Week in AI, a post I publish every Friday to share a couple things I’ve discovered in the world of AI last week. I spend way too much time in Discord, on Twitter and browsing reddit so you don’t have to!

If you have a tip or something you think should be included in next week’s post, send an email to keanan@floorboardai.com with more info.

This week, I finally got my hands on Google’s Gemini AI and tested its ability to understand and parse video and we talk a bit about how people are using the advances in LLMs to make their own prompts even better.

Let’s dive in!

Hands-on with Google’s Gemini Model

I finally got access to Gemini this week and, inspired by this Tweet of Josh’s, wanted to see how good the vision actually was.

I recorded a 20 second video of myself holding a few different Pokemon cards up to my phone camera and gave Gemini this prompt:

Output a JSON object that contains all the pokemon cards shown in the attached video. For each card include: the pokemon's name, and a list of their attacks. For each attack, provide the name, the numerical damage (or null if there is no numerical damange) and the description of the attack (or null if there is no description. Note, attack damage might be in slightly non-numeric forms such as 6x or 50+.

Here's an example:

{
 "pokemonName": "Pikachu",
 "energyType": "Electric",
 "attacks": [
    {
      "name": "Thundershock",
      "damage": "60",
      "description": null
    },
    {
      "name": "Quick Attack",
      "damage": null,
      "description": "A quick attack to reduce the opponent's ability to attack back"
    }
  ] 
}

Here’s a screenshot from the video and the output it returned back to me.

And the JSON I got back was this…

{
    "pokemonName": "Doublade",
    "attacks": [
      {
        "name": "Swords Dance",
        "damage": null,
        "description": "During your next turn, this Pokemon's Slicing Blade attack does 80 more damage (before applying Weakness and Resistance)."
      },
      {
        "name": "Slicing Blade",
        "damage": "40",
        "description": null
      }
    ]
  },

Which was surprisingly accurate! It wasn’t 100% accurate, so you would still need to check this data. It also seemed less reliable than the image parsing inside ChatGPT or some of the other tools, but I’m sure it will continue to improve.

Asking LLMs help you write better prompts

This is a bit of a meta discovery that’s been happening over the past few weeks, but Matt figured out a way to get an AI to take a task from a person, generate a bunch of prompts, rank them and give you back the very best version.

It’s wild to me that the list of jobs that AI will replace now contains a job that AI itself created, “prompt engineering”. I’m looking forward to taking a few of my prompts that I’ve been struggling with and having this tool evaluate them and make them even better.

See you next week!

If you’ve made it this far and enjoyed “Last Week in AI”, please drop your email down below so next week’s edition goes straight to your inbox. Talk soon!