Skip to main content

OpenAI o1 Model

overview

Areas whereLLM needs to be strengthened

The three basic capabilities of a large Model:

  • language understanding and expression ability: GPT-3 has been solved
  • World Knowledge Storage: GPT-4 has solved a lot
  • Logical reasoning ability: It is the weakest link, and the o1 Model has made significant progress in this regard.

theory

The idea of the o1 Model to enhance logical reasoning capabilities is: after receiving a question, it automatically generates a CoT and then generates an answer. Avoid human prompts based on CoT.

If o1 is given the ability to automatically generate CoT, Reinforcement Learning is used to improve the ability of the Model to automatically find the correct reasoning Path.

A new normal form?

OpenAI found that during the Training and reasoning process of Reinforcement Learning, as computing power increases, the Model reasoning Effectiveness (such as the Probability of finding the correct CoT Path) also increases. This may find RL Scaling. It is possible to continuously enhance computing power and further enhance the logical reasoning capabilities of the Model. This may be a new normal form.

Objective scenario

Due to enhanced logical reasoning capabilities, the o1 Model is suitable for programming, mathematics, physics and other scenarios. But writing articles may be worse. A bit like a science student who specializes in subjects.

My feeling is that there is no obvious gap between the o1 Model and the Sonnet 3.5 Model in terms of programming.

A series of Model

  • o1 Model: Not open, with the strongest capabilities
  • o1-preview: Only accessible to ChatGPT (members) and API (local tycoons who need to spend more than US$1000)
  • o1-mini: Compared with o1-preview, it lacks a lot of world Knowledge. It is suitable for scenarios that do not require a lot of world Knowledge, such as programming and mathematics, and has fast response speed.

OpenAI should launch a series of Model with "stronger logical reasoning capabilities and more specialized subjects."

Immature Model

The o1 series Model has not been fully productized yet, and does not support vision, streaming output, Function Call, etc. That's why it's called preview.

more expensive

The price of o1-preview is 4 times that of GPT-4o.

In addition, during the reasoning process, the hidden Prompt (CoT) generated also occupies Token. This will also increase costs in disguise.

ModelInput price (per million tokens)output price (per million tokens)
o1-preview$15$60
o1-mini$3$12
GPT-4o$5$15

core principle

We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.

做出反应之前花更多时间思考问题,就像人类一样。LLM will improve their own thought processes, try different Policy, and and if you do recognize your mistakes.

OpenAI o1 thinks before it answers and can produce a long internal chain-of-thought before responding to the user.

After receiving a question, an internal chain-of-thought is generated, and an answer is generated based on this CoT.

Using Reinforcement Learning (RL) to TrainingLLM can automatically generate CoT-based Prompts.

Objective user and scenario

They can reason through complex tasks and solve harder problems than previous models in science, coding, and math. They can reason through complex tasks and if you do solve problems that are more difficult than previous scientific, Code, and mathematical model

These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.

如果您正在解决科学、编码、数学和类似领域的复杂问题,这些增强的推理能力可能特别有用。For example, healthcare researchers can use o1 to annotate cell sequencing Data, physicists can use o1 to generate the complex mathematical formulas needed for quantum optics, and developers in all Domain can use o1 to build and execute multi-step workflows.

Performance: Improve logical reasoning capabilities

The next Model update performed similarly to doctoral students on challenging Baseline tasks in physics, chemistry, and biology. We also found that it performed well in math and Code. In the International Mathematical Olympiad (IMO) qualifying exam, GPT-4o only solved 13% of the problems correctly, while the Score for the Reasoning Model was 83%. Their Code capabilities were evaluated in the competition and and if you do reached the 89th percentile in the Codeforces competition. You can read more correlation in ourtechnical research article.

Improved capabilities in all aspects

  • Code improvement: more than 300%
  • Mathematics/Chemistry/Biology/Physics: Increase by about 40%

image-20240914133949224

test ability

image-20240913165916756

access restriction

  • ChatGPT Plus users will be able to access the o1 Model in ChatGPT.
  • The weekly rate limit for o1-preview is 30 Message, and the weekly rate limit for o1-mini is 50 Messages.
  • Users with Layer 5 permissions can access the API. The speed limit is 20 RPM.

[Different Layers]

TierQualification
Tier 1$5 paid
Tier 2$50 paid and 7+ days since first successful payment
Tier 3$100 paid and 7+ days since first successful payment
Tier 4$250 paid and 14+ days since first successful payment
Tier 5$1,000 paid and 30+ days since first successful payment

pricing

The price of o1-preview is 4 times that of GPT-4o. There are also hidden CoTs that also occupy tokens.

ModelInput price (per million tokens)output price (per million tokens)
o1-preview$15$60
o1-mini$3$12
GPT-4o$5$15

Insufficient Model

There is no ability to browse the web to obtain info and upload files and Image.

API does not support:

  • Modalities: text only, images are not supported.
  • Message types: user and assistant messages only, system messages are not supported.
  • Streaming: not supported.
  • Tools: tools, function calling, and response format parameters are not supported.
  • Logprobs: not supported.

living example

Example 1: Logic Puzzle

issue

A princess is as old as the prince will be when the princess is twice the age that the prince was when the princess' age was half the sum of their present ages.Provide all solutions to this question.

explain

reference version

o1-preview's reply

Internal Prompt

解质量子问题

我正在解一个关于年龄的复杂问题。将时间点和年龄分别定义为变量,解析这些年龄差距,试图找出满足条件的解。

计算时间差

我开始计算王子和公主的年龄之和P + Q。然后,确定公主年龄的一半Q',并计算从那时到现在的时间差,即(Q - P)/2年。这让我更好地理解了时间的流动。

分析时间差异

我试图简化时间差异的公式,发现在计算公主和王子年龄时出现了矛盾。回顾过去时,公主的年龄是当前年龄的和的一半。

CHUNK: 假设公主的现年龄为Q。那么在时间t1时,公主的年龄为Q - t1。

根据题目要求,在时间t1时: Q - t1 = (P + Q)/2

解方程得: t1 = Q - (P + Q)/2 = [2Q - P - Q]/2 = (Q - P)/2

类似地,公主的年龄为P - t1 = P - [(Q - P)/2] = [2P - Q + P]/2 = (3P - Q)/2

调整计算细节

我正在解决王子年龄的问题时,调整时间t1的计算方式。通过精简公式,发现王子年龄计算有误,正在检查每个步骤的准确性。

检视方程式
我正尝试重写并简化方程式,但遇到了一些麻烦。看看哪里出错了,可能是算术操作中的小错误。抱着怀疑的态度,我重审每一步,以确保准确性。

计算王子的年龄

我计算出王子在时间 t1 的年龄为 (3P - Q)/2。\n接下来,探讨公主未来的年龄与王子 t1 时的年龄的关系。

计算未来时间 t2 时,公主年龄是那时王子年龄的一倍。首先,计算王子当前年龄 P1,然后用将来的年龄来确定 t2。

分析年龄增长

公主的年龄将是Q + t2, 王子的年龄则为P + t2, 计算得t2 = 3P - 2Q。

探讨年龄关系

公主的年龄等于王子未来年龄的两倍,条件涉及公主年龄的变化和王子年龄的历史。

深入分析

从公主年龄变化出发,整理王子年龄的历史,体现出他们之间的复杂关系。

修正方程并计算

我正在修正年龄方程,重新计算年龄差。最终得出,公主的年龄是王子年龄的四分之三。这个过程让我思考年龄与时间的关系。

提到:

亲王年龄 = 福 + t2 = (3Q/4) + (Q/4) = (4Q/4) = Q

提与 t2 相关: 公主 = (5Q/4),亲王 = Q

现在思考中: 公主年龄与亲王未来的年龄一致,条件是公主的年龄是亲王一半年龄的总和的一半。

探寻岁数关系

我正在分析王子年龄涉及到的问题。假设王子的年龄为P = 0.75Q,可得王子的现年岁仔细审视其值。

确定年龄关系

我分析了公主和王子的年龄关系,计算得出她们分别为6和8岁。公主在1年前7岁,符合年龄和当前年龄的半和。

确定数学关系

皇子的年龄是 6 岁,公主的年龄是 8 岁。公主将来的年龄是 10 岁,这个时候皇子的年龄是 8 岁。

计算公主和王子的现年龄之和的一半,为3.5岁。

解决年龄问题

尝试找出公主和王子的年龄差异时,发现公主比王子大 1 岁。王子在未来 2 年内年龄将增长,但没有立即对答案有帮助。

尝试不同的数字

我正在探索不同的整数解,选择Q = 16,计算出P = 12。接下来,计算时间t1 = 2年,公主当时14岁。通过检查这些条件,得出公主和王子未来的年龄。

分析年龄差异

我正探索王子和公主之间的年龄差异。王子目前12岁,8年后将成20岁。公主20岁时,正好是王子16岁的时候。这个想法令人兴奋。

寻找解答

考虑Q必须是4的倍数,P可以表示为3k。这让我们计算出一组可能的年龄解。

计算两人现在的年龄差

目前,这对王子和公主的年龄分别是3k和4k。我正在尝试找出在公主年龄是王子年龄的两倍时,王子将多大年龄。

分析年龄差异

我正在思考公主和王子的年龄差异。公主现年5千年,王子3千年,公主两倍年龄将是王子当时的一半和他们当前年龄之和。

探索年龄关系

我正在分析王子和公主的年龄关系,推测出在公主年纪为5k时,王子的年龄为5k/2。谢谢阅读!

the final answer

Conclusion:

The only integer solution satisfying all conditions is:

Prince's age: 6 years
Princess's age: 8 years
Answer: The prince is 6 years old, and the princess is 8 years old.

Example 2: Programming

demand

make a game called "squirrel finder" using HTML5 and JavaScript.
- You are a koala icon that can move quickly using the arrow keys.
- A strawberry randomly spawns every second and bounces around.
- If a strawberry touches you, you die.
- After three seconds, a squirrel icon spawns and bounces around. If you touch it, you win.
- Make sure all icons are 40 pixels by 40 pixels.
- Use retro colors and a dark background.
- Put "openai" in the game screen and a timer in the corner.
- Display the instructions before the game starts.
- Automatically restart the game when you win or lose.
- Give the code in a single file.

The code generated by o1 can be used normally

image-20240913172757406

<!DOCTYPE html>
<html>
<head>
<title>Squirrel Finder</title>
<style>
body {
margin: 0;
overflow: hidden;
background-color: #000;
color: #fff;
font-family: 'Courier New', monospace;
}
#gameCanvas {
display: block;
margin: 0 auto;
background-color: #000;
}
#instructions {
position: absolute;
top: 20%;
width: 100%;
text-align: center;
font-size: 24px;
}
#openai {
position: absolute;
bottom: 10px;
left: 10px;
font-size: 14px;
color: #fff;
}
#timer {
position: absolute;
top: 10px;
right: 10px;
font-size: 18px;
color: #fff;
}
</style>
</head>
<body>
<canvas id="gameCanvas" width="800" height="600"></canvas>
<div id="instructions">
<h1>Squirrel Finder</h1>
<p>You are the Koala. Use arrow keys to move.</p>
<p>Avoid the strawberries!</p>
<p>Find the squirrel to win!</p>
<p>Press any key to start.</p>
</div>
<div id="openai">openai</div>
<div id="timer">0</div>
<script>
(function(){
const canvas = document.getElementById('gameCanvas');
const ctx = canvas.getContext('2d');

let instructions = document.getElementById('instructions');
let timerDisplay = document.getElementById('timer');

let keys = {};

let player = {
x: canvas.width / 2 - 20,
y: canvas.height / 2 - 20,
width: 40,
height: 40,
speed: 7,
color: '#00ff00' // Green
};

let strawberries = [];
let squirrel = null;

let gameTime = 0;
let gameStarted = false;
let gameOver = false;
let gameWin = false;

function startGame() {
instructions.style.display = 'none';
gameStarted = true;
gameTime = 0;
gameOver = false;
gameWin = false;
player.x = canvas.width / 2 - 20;
player.y = canvas.height / 2 - 20;
strawberries = [];
squirrel = null;
lastStrawberrySpawn = 0;
lastTime = performance.now();
requestAnimationFrame(gameLoop);
}

document.addEventListener('keydown', function(e) {
if (!gameStarted) {
startGame();
}
keys[e.key] = true;
});

document.addEventListener('keyup', function(e) {
keys[e.key] = false;
});

let lastTime = performance.now();
let lastStrawberrySpawn = 0;

function gameLoop(timestamp) {
let deltaTime = (timestamp - lastTime) / 1000;
lastTime = timestamp;

update(deltaTime);
render();

if (!gameOver && !gameWin) {
requestAnimationFrame(gameLoop);
} else {
setTimeout(startGame, 2000); // Restart game after 2 seconds
}
}

function update(deltaTime) {
if (!gameStarted) return;

gameTime += deltaTime;
timerDisplay.textContent = Math.floor(gameTime);

// Move player
if (keys['ArrowUp']) player.y -= player.speed;
if (keys['ArrowDown']) player.y += player.speed;
if (keys['ArrowLeft']) player.x -= player.speed;
if (keys['ArrowRight']) player.x += player.speed;

// Keep player on screen
if (player.x < 0) player.x = 0;
if (player.x + player.width > canvas.width) player.x = canvas.width - player.width;
if (player.y < 0) player.y = 0;
if (player.y + player.height > canvas.height) player.y = canvas.height - player.height;

// Spawn strawberries
if (gameTime - lastStrawberrySpawn > 1) {
lastStrawberrySpawn = gameTime;
spawnStrawberry();
}

// Spawn squirrel after 3 seconds
if (gameTime >= 3 && squirrel === null) {
spawnSquirrel();
}

// Update strawberries
strawberries.forEach(function(s) {
s.x += s.vx;
s.y += s.vy;

// Bounce off walls
if (s.x < 0 || s.x + s.width > canvas.width) s.vx *= -1;
if (s.y < 0 || s.y + s.height > canvas.height) s.vy *= -1;

// Check collision with player
if (collision(player, s)) {
gameOver = true;
}
});

// Update squirrel
if (squirrel) {
squirrel.x += squirrel.vx;
squirrel.y += squirrel.vy;

// Bounce off walls
if (squirrel.x < 0 || squirrel.x + squirrel.width > canvas.width) squirrel.vx *= -1;
if (squirrel.y < 0 || squirrel.y + squirrel.height > canvas.height) squirrel.vy *= -1;

// Check collision with player
if (collision(player, squirrel)) {
gameWin = true;
}
}
}

function render() {
// Clear canvas
ctx.fillStyle = '#000';
ctx.fillRect(0, 0, canvas.width, canvas.height);

// Draw player
ctx.fillStyle = player.color;
ctx.fillRect(player.x, player.y, player.width, player.height);

// Draw strawberries
strawberries.forEach(function(s) {
ctx.fillStyle = '#ff0000'; // Red
ctx.fillRect(s.x, s.y, s.width, s.height);
});

// Draw squirrel
if (squirrel) {
ctx.fillStyle = '#ffff00'; // Yellow
ctx.fillRect(squirrel.x, squirrel.y, squirrel.width, squirrel.height);
}

// Display game over or win message
if (gameOver) {
ctx.fillStyle = '#ff0000';
ctx.font = '48px Courier New';
ctx.fillText('You Died!', canvas.width / 2 - 100, canvas.height / 2);
}

if (gameWin) {
ctx.fillStyle = '#00ff00';
ctx.font = '48px Courier New';
ctx.fillText('You Win!', canvas.width / 2 - 100, canvas.height / 2);
}
}

function spawnStrawberry() {
let s = {
x: Math.random() * (canvas.width - 40),
y: Math.random() * (canvas.height - 40),
width: 40,
height: 40,
vx: (Math.random() * 4 + 1) * (Math.random() < 0.5 ? -1 : 1),
vy: (Math.random() * 4 + 1) * (Math.random() < 0.5 ? -1 : 1)
};
strawberries.push(s);
}

function spawnSquirrel() {
squirrel = {
x: Math.random() * (canvas.width - 40),
y: Math.random() * (canvas.height - 40),
width: 40,
height: 40,
vx: (Math.random() * 3 + 2) * (Math.random() < 0.5 ? -1 : 1),
vy: (Math.random() * 3 + 2) * (Math.random() < 0.5 ? -1 : 1)
};
}

function collision(a, b) {
return a.x < b.x + b.width && a.x + a.width > b.x &&
a.y < b.y + b.height && a.y + a.height > b.y;
}

})();
</script>
</body>
</html>

Sonnet-3.5: The generated code also works normally.

Under the ridicule, Claude AI's interaction is significantly better, and the Effectiveness can be previewed.

image-20240913172956926

Comparing the two codes, the Effectiveness of Sonnet 3.5 will be better.

Although both implementations meet the core requirements, Code 2 (Sonnet 3.5) is generally slightly better:

  1. It has a more group, object-oriented structure that is easier to maintain and extend.
  2. It uses more subtle colors that are more suitable for Vintage styles.
  3. It uses classes to represent Representation game objects, promoting better code reuse.

However, Code 1 has some advantage in performance due to the use ofrequestAnimationFrame.

Example 3: Reasoning

Question: How many r's in are in strawberry

GPT-4o: Wrong result

“strawberry”中有1个字母“r”

Sonnet-3.5 error

There are 2 r's in "strawberry".

Example 4: Fanyi translates a damaged sentence

to be supplemented

reference

Introducing OpenAI o1-preview

OpenAI post

SuperCLUE test results