Synthetic Data’s Practical Uses for AI and Agent Systems — KPMG’s Fabiana Clemente Explains
KPMG distinguished engineer Fabiana Clemente discusses practical applications, limitations and best practices for synthetic data in a Generative AI in the Real World podcast hosted by Ben Lorica. Clemente defines synthetic data as non‑real‑world data used across a spectrum of use cases — from test data management and privacy‑preserving data replicas for offshore teams to improving fraud detection and training/evaluating multi‑agent AI systems. She warns against common mistakes: oversimplifying synthetic data generation and failing to match methodology to use case. Key takeaways include the need for governance, evaluation metrics, and established processes for producing usable synthetic datasets. Clemente notes text now dominates synthetic‑data production (driven by LLMs), but stresses that synthetic data also includes simulation approaches. She highlights synthetic data’s role in agent development — creating structured knowledge and scenario simulations for multistep tool‑using agents — while acknowledging concerns such as feedback loops when models train on AI‑generated data. Overall, she positions synthetic data as an important accelerator for AI development when paired with proper planning, governance and tooling.
Neutral
The article is a technical discussion of synthetic data use-cases, governance and risks rather than news of a market-moving event, funding round, regulatory change or product launch. For crypto traders, implications are indirect: improved synthetic data and agent workflows could accelerate development of on‑chain analytics, AI-driven trading bots, and privacy-preserving data services, which are positive structural developments but not immediate price drivers. Concerns such as AI-trained-on-AI feedback loops and model quality could introduce longer‑term risks for AI tooling reliability. Historically, technical advancements in developer tooling or data infrastructure produce neutral-to-moderately bullish effects over months as adoption grows (e.g., improved oracle or analytics tooling). Short-term market volatility is unlikely to be driven by this discussion alone; long-term, better synthetic data and agent testing could modestly boost crypto infrastructure and algorithmic trading efficiency.