Ancient text corpora

Ancient text corpora

Ancient text corpora are the entire collection of texts from the period of ancient history, defined in this article as the period from the beginning of writing up to 300 AD. These corpora are important for the study of literature, history, linguistics, and other fields, and are a fundamental component of the world's cultural heritage. Chinese, Latin, and Greek are examples of ancient languages with significant text corpora, although much of these corpora are known to us via transmission (frequently via medieval manuscript copies) rather than in their original form. These texts – both transmitted and original – provide valuable insights into the history and culture of different regions of the world, and have been studied for centuries by scholars and researchers. Other ancient texts – particularly stone inscriptions and papyrus scrolls – have been published following archaeological research, notably the cuneiform corpus of c.10 million words and the c.5 million words in ancient Egyptian. Through advances in technology and digitization, ancient text corpora are more accessible than ever before. Tools such as the Perseus Digital Library and the Digital Corpus of Sanskrit have made it easier for researchers to access and analyze these texts. == Quantifying the corpora == Two types of ancient texts are known to modern scholars – those that have only survived in younger manuscripts, but whose great age is undisputed (this applies to the bulk of the Chinese, Brahmi, Greek, Latin, Hebrew and Avestan tradition), and those known from original inscriptions, papyri and other manuscripts. Counting of the words in each corpus presents significant methodological challenges – in principle, every single occurrence of a word in the text is counted separately, but in the case of parallel transmission of literary texts, only a single transmission is taken into account. Just as the Book of the Dead and the coffin texts are only included once in the number given for the Egyptian, the Greek and Latin literary works should only be counted according to one manuscript. If, on the other hand, tombs, royal inscriptions or economic documents of certain ancient languages often show a more or less identical form, this is not evaluated as a purely "parallel tradition". Attached prepositions are counted as separate words, except in the case of the definite article in Hebrew, Aramaic and Greek since it has no equivalent in most languages, so its frequency would significantly affect the comparability of numbers. === Languages with known size estimates === === South Asian === Sanskrit (Vedic Sanskrit and Classical Sanskrit) Indus script (3,800 items, c.20,000 characters) Brahmi script Old Tamil Early Indian epigraphy and Indian epic poetry Kharosthi Pali literature List of historic Indian texts === Mesoamerican === Olmec hieroglyphs Maya script === East Asian === Old Chinese Chinese classics The pre-Qin corpus: a collection of ancient Chinese texts written before the Qin dynasty (221 BCE). The corpus includes texts from Confucianism, Taoism, Legalism, and other schools of thought. The pre-Han corpus: a collection of ancient Chinese texts written before the Han dynasty (202 BCE). The corpus includes texts from Confucianism, Taoism, Legalism, and other schools of thought. See the Chinese Text Project Chinese bronze inscriptions, Oracle bone script, Seal script, Clerical script === Central Iranian languages === Prior to 300 AD, the Central Iranian languages are mainly in the form of Sassanid stone inscriptions in the two closely related idioms Middle Persian (Pahlavi scripts and Inscriptional Parthian), there are 5000 for the corpus of Middle Persian (mostly 3rd, but also 4th/5th centuries) and for the corpus of Parthian (3rd century) 3000 words. To what extent some of the Manichaean Middle Persian literary texts may date back to the 3rd century is difficult to estimate; Mani is said to have personally written the Shabuhragan totaling about 5000 words. In any case, if we combine Middle Persian and Parthian, we come to over 10,000 words. === Proto-Sinaitic === Proto-Sinaitic script has no more than about 400 letters (number of words is unknown since the script has not been fully interpreted). To a similar extent, there are probably approximately contemporaneous Proto-Canaanite inscriptions (ibid.). === Anatolian === Luwian cuneiform, approx. 3000 words the Palaic language few hundred words. Hieroglyphic Luwian the Lycian alphabet (the best attested Anatolian successor language written in alphabetic script) with about 5000 words The Lydian alphabet 109 inscriptions comprising about 1500 words The Phrygian alphabet the in-tomb inscriptions from the 2nd and 3rd centuries AD (approx. 1000 words) and in the so-called "old Phrygian" inscriptions less than 300 words The Carian alphabets whose texts, mainly from Egypt, contain around 600 words. === Old Italic === the Umbrian language attested essentially by the sacrificial instructions of the Iguvinian Tables with 5000 words the Oscan language (ibid.) with 2000 words the Messapic language with probably a good 1000 words (the estimate is difficult because most texts in this hardly understandable language do not use word separators) the Venetic language a few hundred words the Faliscan language a few hundred words Cisalpine Celtic inscriptions amount to approximately 2000 words, to which are added a number of glosses by classical authors === Iberia === Iberian scripts, more rarely written in Greek or Latin script, approx. 2500 words Celtiberian script, which refers to Celtic language testimonies in Iberian, but also in Latin script from Spain (approx. 1000 words) Southwest Paleohispanic script, 78 inscriptions, a few hundred words Lusitanian language, three monuments in Latin script, approx. 60 words === Germanic Northern Europe === Runic inscriptions dated before the 4th century amount to about 30 pieces, which contain no more than 50 words in total === Africa === Geʽez script: comparatively few inscriptions with a total of around 1,000 words before 300 AD. Following Christianization in the 4th century, more extensive texts are known. Libyco-Berber alphabet: over 1,000 inscriptions from the Maghreb, which are dated to Roman times. Most texts do not use a word separator; Peust estimates that the total number of words could be around 5,000 Meroitic script (Ancient Nubian): about 900 texts are known, which Peust estimates may contain approximately 10,000 words, albeit with uncertainty from the fact that the word separator is not used consistently in the Meroitic script. === Aegean === The Cretan Linear A inscriptions that have not yet been deciphered are available in about 2500 texts, which contain a total of around 20,000 characters. The total number of words can hardly be determined; Peust tentatively put it in the same order of magnitude as in Meroitic. In addition to the Linear A texts, there are also inscriptions Cretan hieroglyphs of a few hundred characters and texts written in the Greek alphabet, but not in Greek, with a few dozen words Cypriot syllabary in the first millennium BC, in which mostly Greek texts were recorded. The relevant texts comprise around 100 to 200 words. === Micro corpora === There are a significant number of ancient micro-corpus languages. Estimating the total number of attested ancient languages may be as difficult as estimating their corpus size. For example, Greek and Latin sources hand down an enormous amount of foreign-language glosses, the seriousness of which is not always certain. == Preservation and curation == Historic preservation and maintaining ancient text corpora presents several challenges, including issues with preservation, translation, and digitization. Many ancient texts have been lost over time, and those that survive may be damaged or fragmented. Translating ancient languages and scripts requires specialized expertise, and digitizing texts can be time-consuming and resource-intensive. == Corpus linguistics == The field of corpus linguistics studies language as expressed in text corpora. This includes the analysis of word frequency, collocations, grammar, and semantics. Ancient text corpora provide a valuable resource for corpus linguistics research, enabling scholars to explore the evolution of language and culture over time.

Multi-model database

In the field of database design, a multi-model database is a database management system designed to support multiple data models against a single, integrated backend. In contrast, most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. Document, graph, relational, and key–value models are examples of data models that may be supported by a multi-model database. == Background == The relational data model became popular after its publication by Edgar F. Codd in 1970. Due to increasing requirements for horizontal scalability and fault tolerance, NoSQL databases became prominent after 2009. NoSQL databases use a variety of data models, with document, graph, and key–value models being popular. A multi-model database is a database that can store, index and query data in more than one model. For some time, databases have primarily supported only one model, such as: relational database, document-oriented database, graph database or triplestore. A database that combines many of these is multi-model. This should not be confused with multimodal database systems such as Pixeltable or ApertureDB, which focus on unified management of different media types (images, video, audio, text) rather than different data models. For some time, it was all but forgotten (or considered irrelevant) that there were any other database models besides relational. The relational model and notion of third normal form were the default standard for all data storage. However, prior to the dominance of relational data modeling, from about 1980 to 2005, the hierarchical database model was commonly used. Since 2000 or 2010, many NoSQL models that are non-relational, including documents, triples, key–value stores and graphs are popular. Arguably, geospatial data, temporal data, and text data are also separate models, though indexed, queryable text data is generally termed a "search engine" rather than a database. The first time the word "multi-model" has been associated to the databases was on May 30, 2012 in Cologne, Germany, during the Luca Garulli's key note "NoSQL Adoption – What’s the Next Step?". Luca Garulli envisioned the evolution of the 1st generation NoSQL products into new products with more features able to be used by multiple use cases. The idea of multi-model databases can be traced back to Object–Relational Data Management Systems (ORDBMS) in the early 1990s and in a more broader scope even to federated and integrated DBMSs in the early 1980s. An ORDBMS system manages different types of data such as relational, object, text and spatial by plugging domain specific data types, functions and index implementations into the DBMS kernels. A multi-model database is most directly a response to the "polyglot persistence" approach of knitting together multiple database products, each handing a different model, to achieve a multi-model capability as described by Martin Fowler. This strategy has two major disadvantages: it leads to a significant increase in operational complexity, and there is no support for maintaining data consistency across the separate data stores, so multi-model databases have begun to fill in this gap. Multi-model databases are intended to offer the data modeling advantages of polyglot persistence, without its disadvantages. Operational complexity, in particular, is reduced through the use of a single data store. == Benchmarking multi-model databases == As more and more platforms are proposed to deal with multi-model data, there are a few works on benchmarking multi-model databases. For instance, Pluciennik, Oliveira, and UniBench reviewed existing multi-model databases and made an evaluation effort towards comparing multi-model databases and other SQL and NoSQL databases respectively. They pointed out that the advantages of multi-model databases over single-model databases are as follows : == Architecture == The main difference between the available multi-model databases is related to their architectures. Multi-model databases can support different models either within the engine or via different layers on top of the engine. Some products may provide an engine which supports documents and graphs while others provide layers on top of a key-key store. With a layered architecture, each data model is provided via its own component. == User-defined data models == In addition to offering multiple data models in a single data store, some databases allow developers to easily define custom data models. This capability is enabled by ACID transactions with high performance and scalability. In order for a custom data model to support concurrent updates, the database must be able to synchronize updates across multiple keys. ACID transactions, if they are sufficiently performant, allow such synchronization. JSON documents, graphs, and relational tables can all be implemented in a manner that inherits the horizontal scalability and fault-tolerance of the underlying data store. == Theoretical Foundation for Multi-Model Databases == The traditional theory of relations is not enough to accurately describe multi-model database systems. Recent research is focused on developing a new theoretical foundation for these systems. Category theory can provide a unified, rigorous language for modeling, integrating, and transforming different data models. By representing multi-model data as sets and their relationships as functions or relations within the Set category, we can create a formal framework to describe, manipulate, and understand various data models and how they interact.

Xiaomi MiMo

Xiaomi MiMo is a family of large language models (LLMs) developed by Xiaomi. It was initially released in April 2025 with the MiMo-7B model. Currently, MiMo is available for developers through API service. It is used as the key AI model in Xiaomi's "Human x Car x Home" ecosystem. == Development == Xiaomi developed MiMo as a reasoning-focused language model. Its development team was led by Luo Fuli, who had previously worked at DeepSeek before joining Xiaomi in late 2025. The model was trained using multi-token prediction and reinforcement learning, with a particular emphasis on mathematical reasoning and code generation tasks. In March 2026, Xiaomi CEO Lei Jun announced that the company planned to invest at least US$8.7 billion in artificial intelligence over the following three years. == Models == === List of models === === MiMo-7B === MiMo-7B is the first model of this LLM. The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using web pages, academic papers, books, and synthetic reasoning data. MiMo-7B-RL underwent supervised fine-tuning and reinforcement learning on 130,000 mathematics and code problems. MiMo-7B-RL-0530 was released in May 2025. It scaled the fine-tuning dataset from 500,000 to 6 million instances and extended the RL window from 32,000 to 48,000 tokens and improved AIME 2024 scores from 68.2 to 80.1. MiMo-VL-7B was a vision-language model combining a Vision Transformer encoder with the MiMo-7B backbone. It was trained in four stages consuming 2.4 trillion tokens. Its reinforcement learning variant used Mixed On-Policy Reinforcement Learning (MORL) which integrated reward signals across perception, grounding, and reasoning. Xiaomi also released MiMo-Audio-7B, an audio-language model for voice conversion, style transfer, and speech editing. === MiMo-V2-Flash === MiMo-V2-Flash was launched in December 2025. It is a open-sourced Mixture-of-experts model with 309 billion total parameters and 15 billion active parameters. It was trained on 27 trillion tokens using FP8 mixed precision. It used hybrid attention interleaving Sliding Window and Global Attention at a 5:1 ratio. === MiMo-V2-Pro === Xiaomi publicly introduced MiMo-V2-Pro on 18 March 2026. It has over 1 trillion total parameters, 42 billion active, and a 1-million-token context window. Before the official release, the model had appeared anonymously on OpenRouter under the codename "Hunter Alpha," where it drew substantial usage and topped daily charts for several days, according to Xiaomi and Reuters. During its listing on OpenRouter, the model reportedly processed over one trillion tokens in total usage. Xiaomi later said Hunter Alpha was an early internal test build of MiMo-V2-Pro, and Reuters reported that the model had been mistaken by some users for a possible DeepSeek system before Xiaomi confirmed its origin. The model was released as a proprietary API product, and Luo Fuli stated that Xiaomi intended to open-source a variant at an unspecified future date. Xiaomi has partnered with several API web platforms like OpenClaw to launch the model. All these websites initially offered a free trial of this model for a week, but due to the overwhelming response, Xiaomi later extended the free trial period of the model until 2 April 2026. === MiMo-V2-Omni === Alongside MiMo-V2-Pro, Xiaomi launched MiMo-V2-Omni on 18 March 2026. It handles image, video, audio, and text inputs. Before the official release, it was codenamed "Healer Alpha" in OpenRouter. === MiMo-V2-TTS === On the same date as the release of MiMo-V2-Pro and MiMo-V2-Omni, a Text-to-Speech model named MiMo-V2-TTS was released also. It is a speech synthesis model. It was trained on audio data, which makes it capable of emotional transitions, mid-sentence tone shifts, singing, and synthesis of regional dialects like Sichuan, Cantonese, Henan, and Taiwanese. == Licensing == Xiaomi has used different licensing approaches for different models in the MiMo family. The MiMo-7B series and MiMo-V2-Flash were released as open-weight models. MiMo-V2-Flash was published under the MIT license with model weights and inference code available on Hugging Face. MiMo-V2-Pro and MiMo-V2-Omni were released as proprietary models. It was accessible through Xiaomi's API platform and third-party API providers. Luo Fuli stated that Xiaomi intended to open-source a variant of MiMo-V2-Pro. Although, she did not specify any timeline. MiMo-V2-TTS was released as a proprietary model with no publicly available weights.

Controlled natural language

Controlled natural languages (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the semi-automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal syntax and formal semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc. == Languages == Existing controlled natural languages include: == Encoding == IETF has reserved simple as a BCP 47 variant subtag for simplified versions of languages.

Corona-Warn-App

Corona-Warn-App was the official and open-source COVID-19 contact tracing app used for digital contact tracing in Germany made by SAP and Deutsche Telekom subsidiary T-Systems. It had been downloaded 22.8 million times as of 19 November 2020 and 26.2 million times as of 18 March 2021. The app has been promoted by billboard and broadcast advertisements, e.g. in cooperation with the German Football Association (DFB) and other prominent companies. The German government has announced that the app would no longer exchange tracing information as of April 30, 2023 & would enter hibernation as of June 1, 2023. == Effectiveness == Experts believe that time saved by using the app can be critical for improving the effectiveness contact tracing efforts. Some virologists say when at least 60% of people in Germany use it, it would be very effective. == Functioning == The app works with the Exposure Notification Framework (what is implemented in Google Play Services for Android and in iOS) by using Bluetooth to exchange codes with app users that are within 1.5 meters of each other for a period of at least 10 minutes. Anyone who tests positive for COVID-19 can share this information voluntarily with the app. Other app users are then notified about when, how long and at what distance they had contact with the infected person within a 14-day period. Testing is available for persons on a voluntary basis. === Server architecture === Based on the Client–server model five servers are operated within the app backend: the Corona-Warn-App server. It stores the authorized keys of infected users, referred to as diagnosis keys, from the past 14 days in its database. Stored diagnosis keys are grouped into regularly updated blocks which are transmitted to the Content Delivery Network. This interface supplies the keys for the app clients to download and locally compute a potential exposure risk. the Verification server. It is responsible for documenting the approval of the user to share their positive test result with the app and also to verify the test result. the Portal Server. It generates a so-called teleTAN token if the user did not give their consent to share their test result with the app at first but then changed their mind or if the local public health authority or test laboratory is not connected to the app system yet. the Test Result Server. It saves the test results provided by the local public health authorities or test laboratories for further use within the backend. the Federation Gateway Server. It connects to the national Corona-Warn-App servers of participating EU countries to enable transnational key exchange. By the distribution of the data on different servers the decoupling of the data becomes possible and results in an obstructed tracing of the app users. ==== Report of a positive COVID-19 test ==== The app provides a function to warn other app users by uploading their positive test result on a voluntarily and anonymous basis to the Corona-Warn-App server. In case the local public health authority or test laboratory is already connected to the app system, the user receives a QR-Code when the swab specimen is taken that can be scanned in the app. After scanning the QR-Code und the user getting authorized by the Verification server, the app receives an individual Registration token which gets stored locally and with which the status and the result of the test can be checked manually as well as automatically. If the local public health authority or test laboratory is not connected to the app system yet and the user wants to share their positive test result with other app users, it is required to request a teleTAN token by calling the verification hotline of the app. In both cases, the user can upload their diagnosis keys of the last 14 days to the Corona-Warn-App server in case their consent to share the information is given. The Corona-Warn-App server then verifies the uploaded keys by asking the Verification server if the keys are valid and if they are, the Corona-Warn-App server stores them in its database. == Privacy == The use of the app is voluntary. The app implements decentralized data storage to ensure data privacy. Employers can require that Corona-Warn be installed on company phones, but can not compel its use on private phones. == Funding == The open source app, which costs €20 million to develop is intended to supplement human contact tracing efforts, which Germany put in place during the early stages of the COVID-19 pandemic in Germany. In August 2022, a spokesperson for the German ministry of health announced that the total costs including all additional developments are now estimated to be closer to €150m. == Interoperability == At its start the app only worked in Germany, and Jens Spahn, than Federal Minister of Health (CDU), has said the development of a Europe-wide system is a future goal. With the update published on 19 October 2020 the app supports key-exchanges with the EU Interoperability Gateway and is therefore able to communicate with contact tracing apps from Ireland and Italy. Austria, Belgium, Czech Republic, Croatia, Cyprus, Denmark, Finland, Ireland, Italy, Latvia, Malta, Netherlands, Norway, Poland, Slovenia, Spain and Switzerland had joined the gateway as well and are also able to exchange keys with Corona-Warn-App. The app can be downloaded in many App stores outside of Germany. However, as of August 2021, the app is still unavailable for those of notable national German minorities like Turks, Russians or Ukrainians, who use App stores of their home countries. == Software variants == An unofficial Corona-Warn-App has been released on F-Droid, making the app available without proprietary components on Android phones. == Literature == Thomas Köllmann: Die Corona-Warn-App – Schnittstelle zwischen Datenschutz- und Arbeitsrecht. In: Neue Zeitschrift für Arbeitsrecht. Nr. 13, 10. Juli 2020, S. 831–836.

Message queuing service

A message queueing service is a message-oriented middleware or MOM deployed in a compute cloud using software as a service model. Service subscribers access queues and or topics to exchange data using point-to-point or publish and subscribe patterns. It's important to differentiate between event-driven and message-driven (aka queue driven) services: Event-driven services (e.g. AWS SNS) are decoupled from their consumers. Whereas queue / message driven services (e.g. AWS SQS) are coupled with their consumers. Message queues can be a good buffer to handle spiky workloads but they have a finite capacity. According to Gregor Hohpe, message queues require proper mechanisms (aka flow controls) to avoid filling the queue beyond its manageable capacity and to keep the system stable. == Ordering Guarantees in Message Queues == Amazon SQS FIFO and Azure Service Bus sessions are queue-based messaging systems that provide ordering guarantees within a message group or session attempt but do not necessarily guarantee ordered delivery in cases of retries or failures. In SQS FIFO, messages in the same message group are processed in order, with subsequent messages held until the preceding message is successfully processed or moved to the dead-letter queue (DLQ). Once a message is placed in the DLQ, it is no longer retried, creating a gap in the sequence. However, the remaining messages continue to be delivered in order. Azure Service Bus sessions function similarly by maintaining ordering within a session, provided a single consumer processes messages sequentially. The implementation differs from SQS FIFO but follows the same fundamental ordering principle. In contrast, Apache Kafka is a distributed log-based messaging system that guarantees ordering within individual partitions rather than across the entire topic. Unlike queue-based systems, Kafka retains messages in a durable, append-only log, allowing multiple consumers to read at different offsets. Kafka uses manual offset management, giving consumers control over retries and failure handling. If a consumer fails to process a message, it can delay committing the offset, preventing further progress in that partition while other partitions remain unaffected. This partition-based design enables fault isolation and parallel processing while allowing ordering to be maintained within partitions, depending on consumer handling. == Vendors == Apache Kafka Apache Kafka is a distributed system consisting of servers that store and forward messages between producer client and consumer applications. IBM MQ IBM MQ offers a managed service that can be used on IBM Cloud and Amazon Web Services. Microsoft Azure Service Bus Service Bus offers queues, topics & subscriptions, and rules/actions in order to support publish-subscribe, temporal decoupling, and load balancing scenarios. Azure Service Bus is built on AMQP allowing any existing AMQP 1.0 client stack to interact with Service Bus directly or via existing .Net, Java, Node, and Python clients. Standard and Premium tiers allow for pay as you go or isolated resources at massive scale. Oracle Messaging Cloud Service This service provides a messaging solution for applications for asynchronous communication and is influenced by the Java Message Service (JMS) API specification. Any application platform that understands HTTP can also use Oracle Messaging Cloud Service through the REST interface. For Java applications, Oracle Messaging Cloud Service provides a Java library that implements and extends the JMS 1.1 interface. The Java library implements the JMS API by acting as a client of the REST API. Amazon Simple Queue Service Supports messages natively up to 256K, or up to 2GB by transmitting payload via S3. Highly scalable, durable and resilient. Provides loose-FIFO and 'at least once' delivery in order to provide massive scale. Supports REST API and optional Java Message Service client. Low latency. Utilizes Amazon Web Services. IronMQ Supports messages up to 64k; guarantees order; guarantees once only delivery; no delays retrieving messages. Supports REST API and beanstalkd open source protocol. Runs on multiple clouds including AWS and Rackspace. Scaling must be managed by user. RabbitMQ RabbitMQ is a reliable and mature messaging and streaming broker, which is easy to deploy on cloud environments, on-premises, and on your local machine. Supports AMQP, STOMP, MQTT StormMQ Open platform supports messages up to 50Mb. Uses AMQP to avoid vendor lock-in and provide language neutrality. Locate-It Option allows customers to audit the location of their data at all times and satisfy data protection principles. AnypointMQ An enterprise multi-tenant, cloud messaging service that performs advanced asynchronous messaging scenarios between applications. Anypoint MQ is fully integrated with Anypoint Platform, offering role based access control, client application management, and connectors.

LTX (text-to-video model)

LTX is a family of open source artificial intelligence video foundation models developed by Lightricks, and first released in November 2024. The latest models, LTX-2, create videos based on user prompts. They were preceded by LTX Video, which was released in 2024 as the company's first text-to-video model. LTX-2 is part of the LTX family of video generation models, which form the core technology, alongside LTX Studio, of the LTX ecosystem. == History == === Origins: LTX Video (2024–2025) === In November 2024 Lightricks publicly released its first text-to-video model, LTX Video. It was a 2-billion parameter model, available as open source. In May 2025 Lightricks launched LTXV-13b, a version with 13-billion parameters. Two months later, the model broke the 60 second barrier for generated video. === Release of LTX-2 (2025) === In October 2025 Lightricks announced its latest model, and renamed it LTX-2. The model was described as capable of generating synchronized audio and video at native 4K resolution and up to 50 frames per second (fps), using a variety of conditions and prompts, including text-to-video and image-to-video. Google highlighted the fact that LTX-2 was trained on its infrastructure, and saying it was "The first open source AI video generation model, powered by Google Cloud". Upon its release it was ranked in the top-3 models for image-to-video creation by Artificial Analysis, behind Kling 3.5 by Kling AI and Veo 3.1 by Google. Its text-to-image option was ranked 7th. In addition to its open-source release, Lightricks offers API access to LTX-2, allowing developers to generate videos from text and image prompts through a hosted service without running the model locally. === Open Source Release (2026) === In January 2026, Lightricks officially released the full open-source version of LTX-2, making the model’s complete codebase, weights, and associated tooling publicly available. In March 2026 the company released LTX-2.3, which was accompanied by a desktop video editor enabling the entire model to run locally on consumer hardware. == Technical features == === Advancements over LTX Video === LTX-2 builds upon the LTX Video architecture with several major improvements: Unified audio-video generation producing synchronized dialogue, ambience, and motion Native 4K rendering 50-fps output for cinematic motion Three operational modes (Fast, Pro, Ultra) More efficient diffusion pipelines enabling high fidelity on consumer GPUs === Core capabilities === Text-to-video generation Image-to-video generation Multimodal audiovisual synthesis High-resolution spatial and temporal coherence Configurable quality/performance settings Open-source distribution of weights and datasets == Reception == Initial reception to LTX-2 was broadly positive, with several technology and media outlets highlighting its open-source approach and multimodal capabilities. Open Source For You described LTX-2 as “one of the first AI video systems to combine 4K output, synchronized audio, and an open model release,” noting that it positioned Lightricks as a significant competitor to proprietary systems such as OpenAI's Sora and Google's Veo. IEA Green said that the model “could rewrite the AI filmmaking game,” emphasizing that its 50-fps rendering and unified audio-video generation made it suitable for professional studios and independent creators alike. AI News characterized LTX-2 as a “major step forward in the democratization of cinematic-quality video generation,” praising its consumer-grade hardware efficiency and multi-tier generation modes, while also noting ongoing challenges in long-form temporal stability. FinancialContent reported strong interest among creative agencies, attributing the attention to Lightricks’ decision to release model weights and datasets, which reviewers said enabled “a level of transparency not typically seen in commercial AI video models.” === Benchmarks and rankings === Upon release, LTX-2 ranked third for image-to-video creation in the Artificial Analysis benchmark, behind Kling 3.5 and Veo 3.1, while its text-to-video option ranked seventh. As of early 2026, it was the highest-ranked open-source model in the benchmark. === Limitations === Some early reviewers also pointed out quality limitations. The Ray3 technical review noted occasional inconsistencies in lip-sync and motion tracking during long scenes, though it stated these were “in line with the challenges faced by all current AI video diffusion models” and expected to improve with continued iteration. Like other diffusion-based video generators, LTX-2 can produce artifacts in complex multi-person scenes and may struggle with precise text rendering within generated video.