Scale AI Insights: Type-Specific Updates

Scale AI Updates by Year and Month

4 Significant Changes from the Last 6 Months

Date Update Type Description View
29-07-2025 WebGuard Enhances AI Safety for Web Agents AI Safety and Security WebGuard, developed by Scale AI with UC Berkeley and Ohio State University, is a pioneering dataset designed to assess and improve the safety of AI web agents. It features 4,939 human-annotated actions from 193 websites, categorized by risk levels to guide safe decision-making. Fine-tuning with WebGuard significantly boosts model accuracy, with smaller models like Qwen2.5-VL-7B achieving up to 80% accuracy in identifying high-risk actions. Researchers invite the community to use the public dataset to advance AI safety.
12-07-2025 Scale AI Team Tackles Controversial AI Statements Podcast In Episode 9 of Scale AI’s Human in the Loop podcast, the Enterprise team debates bold AI claims, like whether coding agents will replace engineers or if fine-tuning is obsolete due to large context windows. They explore barriers to enterprise AI adoption, emphasizing human challenges over technical ones, and discuss the reliability of single versus multi-agent systems. The team’s insights, drawn from working with top enterprises, highlight practical strategies for building effective AI systems. Listen to the episode or read the transcript at scale.com to gain industry insights.
28-06-2025 Scale AI Recognized for AI Data Innovation Awards & Honours Scale AI, a leader in data annotation, earned a spot on TIME’s 2025 list of the 100 Most Influential Companies for its critical role in advancing AI through high-quality data labeling. With over 240,000 gig workers, Scale supports major AI firms, though its recent $14.3 billion Meta deal may shift ties with rivals like OpenAI. Its new division, backed by a U.S. Department of Defense contract, rapidly grows by tailoring AI models for large organizations. Learn more at scale.com.
26-06-2025 Scale AI’s FORTRESS Enhances AI Safety Evaluation AI Safety and Security Scale AI introduces FORTRESS, a benchmark designed to evaluate large language models (LLMs) for national security and public safety risks. Featuring over 1,010 expert-crafted adversarial prompts, FORTRESS assesses model safeguards across domains like Chemical, Biological, Radiological, Nuclear and Explosive (CBRNE) activities, ensuring robust protection against misuse while minimizing over-refusals of benign requests. It provides a balanced, scalable framework with instance-specific rubrics for precise evaluation. Visit scale.com to explore the FORTRESS leaderboard and methodology.