If you don’t know what comprises synthetic data, well, don’t worry; you have plenty of company. Synthetic data is information that’s artificially manufactured by machines rather than generated by real-world events. Synthetic data is created algorithmically and is used as a stand-in for test datasets of production or operational data to validate mathematical models and, increasingly, to train machine-learning models. This substitutional data helps preserve privacy in personal information and can save IT systems a great deal of time, trouble, and money in the process.
When machine-learning models are being created, the data has to be pure; if there are errors, duplications, or other hiccups in real data in building such models, problems inevitably will surface, costing time and money for the company. With more and more artificial intelligence and machine-learning models being used in various use cases, the need for synthetic data is rapidly growing. Analysts have projected that more synthetic than original data will be used to build ML models by the end of the decade.
There are companies focusing on the commercial business use of synthetic data, and one of the first is Gretel, based in San Diego, Calif. The 2-year-old startup on Feb. 1 announced the general availability of its privacy engineering toolkit containing APIs and services that enable users to classify, transform and generate high-quality synthetic data.
Combined, these capabilities remove privacy bottlenecks for numerous development and workflow processes that prevent data sharing and stifle innovation, CEO Ali Golshan told ZDNet.
“We’ve built a privacy toolkit that’s accessible to all developers and scalable to any enterprise-ready project,” Golshan said. “With Gretel, anyone can classify, anonymize, and synthesize data that’s privacy-proven and highly accurate in just a few clicks. Our advanced privacy guarantees also give users complete control to adjust data privacy levels, based on their project needs, and guard synthetic data against adversarial attacks.”
Golshan said the company has tested its products in an open beta program for more than a year. It has incorporated improvements to its toolkit based on feedback from more than 60 enterprise engagements, a community of thousands of users, and open-source users who have downloaded the SDK more than 70,000 times, the company said.
Gretel has been working with organizations over several vertical industries, Golshan said, including health care, life sciences, finance, and gaming. Some of their recent work includes creating synthetic genomic data and synthetic time-series banking data. Interest in Gretel’s privacy engineering tools is supported by analysts’ forecasts that by 2030, synthetic data will completely overshadow real data in AI models, Golshan said
“Today, working with data is hard. Gretel is making it easier. By building flexible, secure, and easy to deploy tools to support data-driven developers, Gretel will open a world of progress across industries,” said Max Wessel, Executive Vice President & Chief Learning Officer at SAP.
Advanced Privacy Engineering Made Accessible
Gretel’s all-in-one privacy stack is comprised of engineering tools that:
create highly accurate, privacy-proven synthetic data;
seed pre-production systems with safe, statistically accurate datasets;
identify and remove sensitive data to reduce PII-related risks;
augment and de-bias datasets to train ML/AI models fairly; and
anonymize sensitive data in real time, for data at scale.
Gretel is also previewing an AWS S3 storage connector for its toolkit. For more information, go here. Gretel’s services can be accessed through its SaaS cloud offering or CLI for local environments.