Sabrina Ortiz/ZDNETMost AI providers try to enhance their products by training them with both public information and user data. However, the latter method puts a privacy-conscious company like Apple in a difficult position. How can it improve its Apple Intelligence technology without compromising the privacy of its users? It’s a tough challenge, but the company believes it has found a solution. Synthetic data vs real dataOpenAI, Google, Microsoft, and Meta train their products partly by analyzing your chats. The goal is to improve the reliability and accuracy of their AIs by scraping data from real conversations. While you can generally opt out of this type of data sharing, the process for doing so varies for each product. This means the responsibility falls on you to figure out how to sever the connection.Also: Will synthetic data derail generative AI’s momentum or be the breakthrough we need?Apple has always prided itself on being more privacy-focused than its tech rivals. To that end, the company has relied on something called synthetic data to train and improve its AI products. Created using Apple’s own large language model (LLM), synthetic data attempts to mimic the essence of real data. Also: Want AI to work for your business? Then privacy needs to come firstFor example, the AI may create a synthetic email that is similar in topic and style to an actual message. The objective is to teach the AI how to summarize that email, a feature already built into Apple Mail. Apple’s solution: ‘Differential privacy’The problem with synthetic data is that it can’t replicate the special human touch found in real-world content. This limitation has led Apple to adopt a different approach, known as differential privacy. As described by Apple in a blog post published Monday, differential privacy combines synthetic data with real data. Here’s how it works. Also: Apple’s AI doctor will be ready to see you next springLet’s say Apple wants to teach its AI how to summarize an email. The company starts by creating a large number of synthetic emails on various topics. Apple then generates an embedding for each synthetic message to capture key elements such as language, topic, and length. These embeddings are sent to Apple users who have opted into analytics sharing on their devices. Each device selects a small sample of actual user emails and generates its own embeddings. The device then determines which synthetic embeddings most closely match the language, topic, and other characteristics of the user emails. Through differential privacy, Apple identifies which synthetic embeddings were the most similar. In the next step, the company can curate these samples to further refine the data or begin using them to train its AI. Also: Forget the new Siri: Here’s the advanced AI I use on my iPhone insteadAs one example provided by Apple, imagine that an email about playing tennis is one of the top embeddings. A similar message is generated by replacing “tennis” with “soccer” or another sport and added to the list for curation or training. Altering the topic and other elements of each email helps the AI learn how to create better summaries for a wider variety of messages. More