Key Takeaways
- QVAC, Tether Data’s AI research division, released QVAC Genesis II, adding 107 billion tokens to what is now the largest public educational synthetic dataset for AI pre‑training.
- Independent evaluations show models trained on Genesis II data deliver stronger reasoning accuracy and clearer answers than prior synthetic sets.
Tether Data’s AI division QVAC has released Genesis II, adding 107 billion tokens to its open-source synthetic dataset for AI pre-training. The full dataset now spans 148 billion tokens across 19 education-focused domains, making it the largest of its kind.
Genesis II expands into new fields like computer science, statistics, and machine learning, while introducing a new “Option-Level Reasoning” approach that teaches models to reason through multiple-choice answers. This builds on QVAC’s prior failure-analysis method from Genesis I.
Tether CEO Paolo Ardoino said the initiative moves AI beyond fluency toward structured understanding. The dataset is available under a Creative Commons license on QVAC’s blog and Hugging Face, supporting open research and local model development outside centralized AI platforms.
bitcoinworld.co.in
blockster.com
decrypt.co
thecryptobasic.com