Let’s talk about OpenAI behind Sora first

The recent source of happiness⬇️

Video generation models as world simulators

Yeah, it’s a crazy name~~

Let’s get back to the point, let’s first quote Lu Qi’s world view on large models last year. It is worth mentioning that Sora had not yet appeared.

Talking about OpenAI’s great achievements from a technical perspective, how did it bring about the era of large models?

Why talk about OpenAI, not Google or Microsoft? To tell the truth, because I know that thousands of people at Microsoft also do this, but they are not as good as OpenAI. At first, Bill Gates didn’t believe in OpenAI at all. He didn’t believe it about 6 months ago. I was stunned when I saw the demo (product prototype) of GPT-4 4 months ago. He wrote an article saying: It’s a shock, this thing is amazing (This is too shocking, this thing is amazing). Inside Google, too, were dumbfounded.

The key technologies that OpenAI has developed along the way:

GPT-1 is the first time to use pre-training methods to achieve efficient language understanding training;
GPT-2 mainly uses transfer learning technology, which can effectively apply pre-training information in a variety of tasks and further improve language understanding capabilities;
DALL·E goes to another mode;
GPT-3 mainly focuses on generalization ability and few-shot (small sample) generalization;
GPT-3.5 instruction following and tuning are the biggest breakthroughs;
GPT-4 has begun engineering.
The Plugin in March 2023 is ecological.
In February 2024, Sora realized the innovation of combining Transformer and diffusion model, which can “fake the real” to realize the real Vincent video capability.

World model – how far away?

AI can understand the physical world in motion. Is this a world model? What does it mean?

If AI cannot understand new physical laws, it cannot be considered equal to or surpass human intelligence.

The sora we see now that we think we understand actually only learns the “correlation” between objects, not the “causality”, so I think this is not a real world model.
The real world has specific description methods, such as water flowing from high to low, there will be sounds after high-fiving, and the mouth will move when eating delicious food. At this stage, Sora only simulates images and visual effects techniques, which can cover We see the world, but it is certainly not an accurate and complete representation.
Because it is still too far away from AI discovering Newton’s second law “F=Ma”.

However, it may just be a matter of time.

Why not us?

People: the most important

Why can OpenAI run through all AGI technology stacks?

First: I believe that complete devotion to solving the problem itself and pure focus beyond oneself are the indispensable driving force behind any successful technology.

If you look at the core figures of the Sora team, you will notice that it is less than a year since Bill graduated from his Ph.D. to actually working. However, he does not seem to be bothered by the daily trivialities such as performance evaluation, OKR, or reporting PPT. It is estimated that There will also be no pressure from buying a car or a house in life. Instead, his energy should be more focused on truly solving the text-to-vide issue .

It’s just about this one thing , it has nothing to do with the outside world! Nothing to do with matter! Nothing to do with fame and fortune! A kind of execution with a pure heart and few desires!

Why does OpenAI “win” every time? Definitely not very lucky

Second: Risk-taking ability and tolerance for failure are what we should learn from.

Xie Saining expressed on the Internet that the Sora team has been in 996 state day and night for a long time, and it must have failed many times, but it is not satisfied with the status quo, is willing to try, dares to challenge, and is not afraid of hitting a wall in exploration.

So, very few of us can do the above two points. . Bar. .

Environment: important? Not that important

The strength of OpenAI – what is the key strength? ?

Real technological innovation is not just the product of scientific laboratories or top conference papers, it should be transformed into technology widely accepted by society through an understanding of market demand, combined with excellent product design and strong execution. Big changes.

OpenAI is not the inventor of the transformer model, and Stable Diffusion is not the inventor of the [ diffusion ] model.
These underlying technologies that seemed super simple to professionals were [ cleverly combined ] and produced this technological revolution at the right time and in the right way.
However, it is undeniable that altman is really a marketing genius~! .
Sora is not “fighting alone”, and each of these people has unique skills. They gathered the seven Dragon Balls and summoned Shenlong today!

Google’s strength is hard to describe now?

Isn’t there another American who often takes advantage of someone else’s wedding clothes? – google~

However, the new product “Gemma: Introducing new state-of-the-art open models” released is consistent with the style of Gemini Pro 1.5 and the previous ones. At first glance, the whole article has quite explosive digital indicators, and it says “open” high. When can I add Chinese to it? Give me some courage to test.
It feels like an elephant that refuses to admit defeat, moving forward in its own determined direction. It cannot be said to be stubborn, but it cannot be said to be witty. But it seems to be really bulky, and it’s a bit difficult to turn around or run forward.
In any case, from open source tensorflow to transformer to today’s Gemma, I pay tribute to deepmind! YYDS

We – why not us, how should we pursue it?

We, humans, should think about how to create a good tool, rather than competing with the tool.
Whether it is a computer or artificial intelligence, it is a controller. It does not matter what it can calculate, but what kind of objects it can control and what it can output. (I forgot who said it)
Therefore, its most important commercial value is to control which productivity and production relations can promote industrial progress and resource utilization, rather than just producing some entertainment fragments.

one more thing

For those of you who are still working and those of you who read blogs

In the process of rapid iteration of technology,

There is no need to be so hard on yourself that you belittle yourself or even belittle yourself .

What can benefit from riding the waves in the evolution of large models?

What has been washed away or even eliminated in the evolution of large models?

——We need to think carefully about these two questions.

Do what you are good at and leave the rest to time.

Talking about what I see: Sora, the world model and us

Let’s talk about OpenAI behind Sora first

World model – how far away?

Why not us?

People: the most important

Environment: important? Not that important

The strength of OpenAI – what is the key strength? ?

Google’s strength is hard to describe now?

We – why not us, how should we pursue it?

one more thing

By savesoff

Related Post

Leave a Reply Cancel reply

You Missed

Stable Diffusion Notes Basic Principles

Learn JAVA annotations from the basics to the depths

Review of iOS – Application of CoreHaptics Framework

Database tuning – hot and cold separation