DeepSeek揭秘开源光学压缩模型，中国AI创业公司的最新突破

访客 2025-10-22 16:23:11 68432 抢沙发

默认

DeepSeek这家中国AI初创公司公布了一项开源光学压缩模型，该模型旨在提高数据存储和传输的效率，通过优化算法降低数据冗余，此模型的推出将有助于推动人工智能领域的数据处理技术的发展，促进大数据的利用和AI应用的广泛普及，具体性能和应用前景有待进一步研究和验证。

Chinese artificial intelligence firm DeepSeek has released DeepSeek-OCR, an open-source model designed to extract and compress text from images and PDFs, aiming to provide large-scale, high-quality datasets for training large language models (LLMs) and vision-language models (VLMs) while dramatically reducing computational requirements.

The model was made publicly available on GitHub yesterday, accompanied by a research paper titled DeepSeek-OCR: Contexts Optical Compression.

The technology behind DeepSeek-OCR leverages optical compression to encode textual information into visual representations, which are stored in an optical format.

According to the company, this approach addresses the major computational bottlenecks LLMs face when processing long-form content such as research papers, legal contracts, financial reports, and dialogue histories. By converting text into images, the system allows models to process extensive documents more efficiently, simulating a gradual forgetting mechanism similar to human memory.

Performance metrics shared in the research indicate that DeepSeek-OCR can achieve over 96% accuracy with a tenfold reduction in data, 90% accuracy at compression rates of 10–12 times, and around 60% accuracy with a 20-fold reduction.

This demonstrates that compact language models can effectively decode compressed visual text, potentially enabling larger models to adopt similar capabilities with fewer resources. The model is also highly scalable: a single A100-40G GPU can reportedly generate more than 200,000 pages of training data per day.

DeepSeek-OCR’s ability to compress long-form textual content opens new possibilities for LLM training, particularly for scenarios requiring the processing of massive amounts of data. By converting dialogues, research materials, and multi-page documents into images, the approach reduces token counts and computational overhead, potentially allowing models to handle larger datasets without a corresponding spike in GPU demand.

The open-source release has already attracted attention within the AI community, with DeepSeek-OCR garnering over 1,400 stars on GitHub shortly after its debut.

Analysts note that while the model represents a significant technical advancement, DeepSeek has been relatively slow in rolling out new models like R2. Some experts speculate that this may suggest the company is temporarily falling behind in the rapidly evolving AI field.

Others, however, interpret the cautious pace as a deliberate strategy to strengthen internal capabilities and lay the groundwork for a next-generation AI model.

标签： addresses major