オープンソースの世界モデル「Waypoint-1」リリース！コンシューマ向けGPUでローカル実行でき一貫した世界シミュレーションを継続、360P・60fpsを達成

アメリカのAIスタートアップ・Overworld社は1月20日（火）、ローカル環境で動作が完結するオープンソース（Apache-2.0ライセンス）の世界モデル「Waypoint-1」を研究プレビューとして公開した。まずは約2.3億パラメータを持つ軽量なウェイト「Waypoint-1-Small」がHugging Faceにて公開されている。

Today, we’re releasing a research preview of our real-time, local-first world model built for interactive, playable AI-worlds

60fps, locally run, all on consumer-grade hardware.

Come take a look pic.twitter.com/QFmZNzhrr0
— Overworld (@overworld_ai) January 20, 2026

「Waypoint-1」は、外部サーバを介さず、ローカル環境内で、解像度360P、最大60fpsを実現する世界モデル。ローカルのメモリ上に「永続的な世界状態（Persistent World State）」を保持し、ユーザーの移動や行動に合わせてリアルタイムに世界を更新し続ける。一般的な家庭用GPUで動作し、低レイテンシの描画とインタラクションを可能にするという。

Project Genie is an impressive demonstration of what world models can do.

But there’s a difference between seeing the future and being able to build with it today.

This is what running locally looks like pic.twitter.com/5p0dkRWjQo
— Overworld (@overworld_ai) January 30, 2026

Overworld社は、Waypoint-1を汎用コンピュータ制御（General-Purpose Computer Control：GPC）のための基盤モデル（Foundation Model）と位置付け、コンピュータの画面ピクセルから直接、次に取るべき行動を予測する視覚動作モデル（Visual Action Model：VAM）と定義する。

Waypoint-1 hackathon January 20th in SF
Model weights + compute on day one.

Build something cool with world models, winner takes home an RTX 5090.

1/2 pic.twitter.com/9zVErKFz6T
— Overworld (@overworld_ai) January 16, 2026

技術面では、高精度なアクショントークナイザー（Action Tokenizer）を採用し、画面上の座標やクリック、キー入力といった複雑な操作を、AIが処理しやすいトークン（数値の列）に変換する。学習には、人間が実際にコンピュータを操作した膨大な実演データ（Human Demonstration Data）が用いられており、不自然な動きを排除した、直感的な操作性を実現している。

A short clip from our real-time, local-first world model.

The world stays coherent as you move with new content generated in context, rather than resetting frame to frame.

New checkpoint coming soon. Research preview is live at https://t.co/kJlSV2AGLc pic.twitter.com/CcDTLDcrQY
— Overworld (@overworld_ai) January 29, 2026

今回提供されたモデルウェイト「Waypoint-1-Small」は、約2.3億と比較的小規模なパラメータ数に抑えつつ、高度な推論能力を維持する。また、現在は約6.1億パラメータの上位版「Waypoint-1-Medium」の公開準備も進められている。

Step in, move around, and see the world update as you act.

We’ve put up a live Hugging Face demo of our real-time world model so you can check it out. pic.twitter.com/ng4Q9X6BAE
— Overworld (@overworld_ai) January 23, 2026

■The Path to Real-Time Worlds and Why It Matters（公式ブログ）
https://over.world/blog/the-path-to-real-time-worlds-and-why-it-matters

■Waypoint-1: A Foundation Model for General-Purpose Computer Control（Hugging Faceブログ）
https://huggingface.co/blog/waypoint-1

■Overworld/Waypoint-1-Small（Hugging Face）
https://huggingface.co/Overworld/Waypoint-1-Small

■Overworld AI（GitHub）
https://github.com/Overworldai

CGWORLD関連情報

●NVIDIAのフィジカルAI向けプラットフォーム「Cosmos」アップデート！ VLM「Reason 2」、WFM「Predict 2.5」、スタイル変換技術「Transfer 2.5」、VLA「Isaac GR00T N1.6」

NVIDIAがCES 2026において、新たなオープンモデルとツール群を発表。そのひとつとして、フィジカルAIを開発・運用するための包括的プラットフォーム「Cosmos」をアップデート。「Cosmos Reason 2」、「Cosmos Predict 2.5」、「Cosmos Transfer 2.5」、「Isaac GR00T N1.6」の4モデルが発表された。
https://cgworld.jp/flashnews/01-202601-NVIDIA-Cosmos.html

●NVIDIAら、最大5分間生成可能な世界モデル「LongVie 2」公開！ユーザー制御性、長時間生成における品質劣化防止、一貫性の確保

復旦大学、南洋理工大学 S-Lab、NVIDIA、清華大学、上海AIラボ、南京大学 PRLabからなる研究チームが、最大で3分から5分にわたる長時間の動画を生成可能なAIモデル「LongVie 2」を発表。GitHubではソース、Hugging Faceではウェイトが公開されているが、ライセンスについては不明。
https://cgworld.jp/flashnews/01-202601-LongVie2.html

●オープンソースのリアルタイム世界モデル「HY World 1.5（WorldPlay）」リリース！ 720P・24fps、一人称と三人称の両視点をサポート

テンセントのHunyuanチームが、リアルタイム世界モデルフレームワーク「HY World 1.5 （WorldPlay）」を発表。オープンソース（TENCENT HY-WORLDPLAY COMMUNITY LICENSE）で公開した。720P・24fpsのリアルタイム生成、一人称と三人称視点をサポートし、フォトリアルな環境からスタイライズされた空想的な世界まで、多様なシーンの生成が可能。また、テキストプロンプトによって特定のイベントを発生させる機能を備え、世界を無限に拡張していく応用もできるとのこと。
https://cgworld.jp/flashnews/01-202601-HYWorld15.html