22 December, 2025 – Tether Data’s AI research division, QVAC, today announced the release of QVAC Genesis II, a major expansion of the world’s largest publicly available synthetic educational dataset for artificial intelligence pre-training. With the addition of 107 billion new tokens, the combined QVAC Genesis dataset now totals 148 billion tokens across 19 educational domains, significantly extending the scale, depth, and reasoning quality of open AI training data.
QVAC Genesis II builds directly on the foundation laid by QVAC Genesis I, which introduced a rigorously validated, education-focused synthetic dataset spanning core STEM disciplines. This second release expands coverage to 10 new domains, including chemistry, computer science, statistics, machine learning, astronomy, geography, econometrics, and electrical engineering, while also regenerating college-level physics using an improved methodology. Together, Genesis I and II form the most comprehensive synthetic educational dataset ever released to the public.
At the core of this release is a new data generation approach called Option-Level Reasoning, designed to extract structured reasoning not only from model failures, but also from correct answers. Rather than treating correct responses as finished outputs, this method systematically analyzes every answer option in a multiple-choice question, reinforcing correct reasoning while explicitly addressing common misconceptions. The result is training data that emphasizes clarity, causality, and decision-making, not just surface-level correctness.
This new approach complements the original Failure Analysis method introduced in Genesis I, forming a dual-method pipeline that ensures every generated question contributes educational value. Independent evaluations show that models trained on Genesis II data demonstrate substantially higher reasoning accuracy and produce clear, unambiguous answers far more consistently than models trained on prior synthetic datasets.
More than a scale increase, this release reflects a deliberate shift in how educational AI data should be built. While much of the industry focuses on scraping and aggregating ever-larger volumes of text, QVAC’s approach is designed to teach models how to think, reason, and explain, grounding intelligence in understanding rather than imitation.
“Most AI training today optimizes for fluency, not understanding,” said Paolo Ardoino, CEO of Tether. “With this release, we’re pushing beyond volume toward structure, reasoning, and clarity. Intelligence should be built on understanding why something is true, not just predicting what sounds right. By making this dataset open, we’re giving researchers and builders the tools to develop AI that is more reliable, more explainable, and ultimately more useful to society.”
As with Genesis I, the expanded dataset is released openly to support researchers, academic institutions, and independent developers working outside of closed, proprietary systems. It is made available under a Creative Commons Attribution–NonCommercial (CC-BY-NC 4.0) license, reinforcing QVAC’s commitment to open, community-driven AI research.
The release continues QVAC’s broader mission to advance local, decentralized intelligence, where AI models can be trained, refined, and deployed without dependence on centralized cloud platforms. By strengthening the open foundations of AI training data, Tether Data aims to reduce structural barriers to innovation and ensure that high-quality intelligence remains accessible to the global research community.
The full technical breakdown of the dataset, titled “QVAC Genesis II: Expanding the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for Pre-training,” is available now via the QVAC research blog, alongside access to the dataset and models on Hugging Face. Further information, including a detailed FAQ section, is available on the QVAC Website.
About Tether Data
Tether Data, S.A. de C.V. (“Tether Data”) is part of Tether’s broader vision to advance freedom, transparency, and innovation through technology. Its mission is to enable people and organizations to connect and share information directly, without unnecessary intermediaries. By creating secure, peer-to-peer systems, Tether Data gives users greater control over their data, communications, and digital interactions. Tether Data aims to redefine how information flows across networks by replacing centralized models with decentralized infrastructure designed for privacy, efficiency, and resilience. The company’s goal is to make global connectivity faster, safer, and more private, empowering individuals and institutions alike to exchange information freely and securely.
About QVAC
QVAC is Tether Data’s advanced AI research initiative dedicated to building open, decentralized, and adaptive intelligence systems. Its mission is Local AI. Infinite Intelligence. No Compromise envisions a world where AI lives and learns on any device, empowering individuals and communities rather than concentrating power in corporate data centers.
beincrypto.com
cryptopolitan.com
u.today