Text to video technology has been around for some time now, with companies like Meta and Google showcasing demos of what’s to come. However, until now, there are very few real text-to-video technologies. In recent times, we have seen nice tools to create cool animation effects, but they don’t quite offer what we would get from text-to-image technology.
There is now an open-source text-to-video 1.7 billion parameter diffusion model available. The diffusion model is out now, and you can play with it using Hugging Face’s, called Model Scope text-to-video synthesis. However, one catch to note is that many of the videos inside the model that it was trained on seem to have been taken from Shutterstock. As a result, many of the videos that the model produces have Shutterstock watermarks across them.
To try this technology yourself, you can visit the Hugging Face Model Scope text-to-video synthesis. You can use it for free or duplicate the space yourself, but you will need to have a credit card for that. The price is very cheap at the moment.
To use the technology, enter a prompt, and it will generate a video. However, with a lot of people trying it out now, you might need to keep trying for a while if you got any error. It’s the hottest thing at the moment, and everyone is trying it out.
Here are some examples that I made. I tried to give a good challenge to the tool, so don’t be too strict in the evaluation of the videos. After all, this is just the beginning!
Prompt:
A blue pig riding a bicycle
Prompt:
Pink planet, seen from space.
Prompt:
A child swimming in a pool.