TOPIC13 stories

Multimodal

Stories, products, and related signals connected to this tag in Explore.

Stories

Inkling powers podcast clipping app with FFmpeg edits

Venturetwins built a podcast clipping app with Thinking Machines’ Inkling. The app analyzes long-form audio, selects clip candidates by topic or best moment, and directs FFmpeg edits.

RELEASE15th July

Google DeepMind releases GenCeption for searchable 4D video scenes

Google DeepMind unveiled GenCeption, which turns video into depth, segmentation, camera rays, 3D keypoints, and prompt-steered scene representations. The thread links project and Hugging Face materials.

NEWS15th July

Goodside benchmarks GPT-5.6 Sol Pro on 150- and 1,025-Pokémon crossword tests

Goodside compared Claude Fable 5 Max puzzle generation with GPT-5.6 Sol Pro solving attempts. Sol solved a 150-Pokémon empty crossword but failed the 1,025-Pokémon version, with one reported success traced to the answer key.

NEWS12th July

Goodside tests GPT-5.6 Sol on fake handwriting and Ghost Font prompts

Goodside tested GPT-5.6 Sol and Claude Fable 5 with fake handwriting, constrained-vocabulary prompts, and Ghost Font. Sol often answered unreadable inputs, while Fable more often refused or pushed back.

NEWS11th July

Goodside benchmarks GPT-5.6 Sol hallucinations on binary noise

Goodside fed GPT-5.6 Sol and Claude Fable 5 binary noise and meaningless handwriting. Sol often invented hidden text, while Fable also failed on noise but more often pushed back on scribbles.

WORKFLOW10th July

Levelsio builds Caltrack with Claude Code on a VPS and Grok vision

Levelsio exported diet logs from Claude chat into Claude Code on a VPS, then built Caltrack with Telegram, SSH entry, database memory, and Grok vision food scanning. The case turned failed chat memory into a persistent app.

RELEASE1w ago

OpenAI rolls out ChatGPT Voice with GPT-Live-1

OpenAI staff said the new ChatGPT Voice is powered by GPT-Live-1 for more natural conversations. The launch also refreshed the ChatGPT shader with Blender prototypes translated to Metal and WebGL using Codex.

RELEASE1w ago

Meta AI starts U.S. rollout of Muse Image and Muse Video

Posts say Meta AI began rolling out Muse Image and Muse Video in the U.S. Muse Image is described with web search, code execution, self-critique, multi-reference composition, and Content Seal watermarking.

RELEASE1mo ago

xAI opens Grok Imagine 1.5 Preview in the Imagine API

xAI opened Grok Imagine 1.5 Preview in its Imagine API, moving the model from benchmark chatter into direct creator access. The same-day Cloudflare AI Gateway support gives teams another route to run Grok models in production.

RELEASE1mo ago

Gemma 4 12B releases with 256K context and unified audio-vision input

Google’s new Gemma 4 12B ships as an encoder-free open model for text, image, audio, and video tasks with a 256K context window. Early GGUF ports and local benchmarks make it a plausible on-device multimodal option for creator tooling and experimentation.

RELEASE1mo ago

ElevenLabs claims Speech Engine adds 70-plus voice languages to agents

A sponsored explainer thread described Speech Engine as a WebSocket layer that adds speech-to-text, turn detection, interruption handling, and text-to-speech to existing LLM agents. The pitch is that teams can keep their current model stack and add voice without rebuilding the whole agent.

RELEASE2mo ago

SenseNova U1 open-sources unified image-text generation with 2K images in ~15s

Posts report SenseTime open-sourced SenseNova U1, a unified text-image model with interleaved generation, 8-step distilled LoRA and ComfyUI workflows. They cite 2K image times around 15 seconds and H100 inference cuts to about 2 seconds, so compare it against your current image pipeline.

RELEASE3mo ago

Tencent releases HY-World 2.0 with persistent 3D world export

Tencent released HY-World 2.0 with WorldMirror 2.0 code and weights for turning text, images, or video into persistent 3D scenes. The output includes navigable geometry and camera data instead of disposable video frames.