site:www.marktechpost.com

Comparing the Top 6 OCR (Optical Character Recognition) Models/Systems in 2025

Optical character recognition has moved from plain text extraction to document intelligence. Modern systems must read scanned and digital PDFs in one pass, preserve layout, detect tables, extract key ...

marktechpost

Cache-to-Cache(C2C): Direct Semantic Communication Between Large Language Models via KV-Cache Fusion

Can large language models collaborate without sending a single token of text? a team of researchers from Tsinghua University, Infinigence AI, The Chinese University of Hong Kong, Shanghai AI ...

marktechpost

A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models

AI companies use model specifications to define target behaviors during training and evaluation. Do current specs state the intended behaviors with enough precision, and do frontier models exhibit ...

marktechpost

The Local AI Revolution: Expanding Generative AI with GPT-OSS-20B and the NVIDIA RTX AI PC

The landscape of AI is expanding. Today, many of the most powerful LLMs (large language models) reside primarily in the cloud, offering incredible capabilities but also concerns about privacy and ...

marktechpost

Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints

Do you actually need a giant VLM when dense Qwen3-VL 4B/8B (Instruct/Thinking) with FP8 runs in low VRAM yet retains 256K→1M context and the full capability surface? Alibaba’s Qwen team has expanded ...

marktechpost

Meet OpenTSLM: A Family of Time-Series Language Models (TSLMs) Revolutionizing Medical Time-Series Analysis

A significant development is set to transform AI in healthcare. Researchers at Stanford University, in collaboration with ETH Zurich and tech leaders including Google Research and Amazon, have ...

marktechpost

DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed for High-Performance OCR and Structured Document Conversion

DeepSeek-AI released 3B DeepSeek-OCR, an end to end OCR and document parsing Vision-Language Model (VLM) system that compresses long text into a small set of vision tokens, then decodes those tokens ...

marktechpost

Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning

ACE positions “context engineering” as a first-class alternative to parameter updates. Instead of compressing instructions into short prompts, ACE accumulates and organizes domain-specific tactics ...

marktechpost

OpenAI Introduces ChatGPT Atlas: A Chromium-based browser with a built-in AI agent

Atlas is a Chromium-based browser that keeps a persistent ChatGPT interface in the new tab page and as an “Ask ChatGPT” sidebar on any site. Users can summarize pages, compare products, extract data, ...

marktechpost

What are ‘Computer-Use Agents’? From Web to OS—A Technical Explainer

Computer-use agents (a.k.a. GUI agents) are vision-language models that observe the screen, ground UI elements, and execute bounded UI actions (click, type, scroll, key-combos) to complete tasks in ...

marktechpost

Google Introduces Speech-to-Retrieval (S2R) Approach that Maps a Spoken Query Directly to an Embedding and Retrieves Information without First Converting Speech to Text

In the traditional cascade modeling approach, automatic speech recognition (ASR) first produces a single text string, which is then passed to retrieval. Small transcription errors can change query ...

marktechpost

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

The post emphasizes coding parity with Sonnet 4 and computer-use gains relative to Sonnet 4 under these scaffolds. Users should replicate with their own orchestration, tool stacks, and thinking ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results