<!–>
ZDNET’s key takeaways
- Proxy service platform Oxylabs offers an enormous pool of ethically-sourced residential proxies, meaning you’re likely to get good quality data without pushback from the sites you’re visiting.
- Oxylabs’ mix of API and AI made it easy for us to run test calls, and should provide a solid foundation for scraping apps.
- Oxylabs has excellent documentation and videos, which should help you get up and running with their tools
- It’s a straightforward process.
–>
Oxylabs<!–> provides a range of web scraping and related services. These include operating proxy machines, providing developer APIs for accessing and making requests through those machines, and providing supporting services (including a scraping-aware AI) for parsing retrieved data in order to use it in applications.
Enormous proxy pools
Compared to other proxy services like IPRoyal or MarsProxies, Oxylabs offers a much larger pool of residential proxy machines. MarsProxies–> reports just a million machines in its proxy pool, IPRoyal reports<!–> having a pool of 32 million residential machines, while Oxylabs offers more than 175 million residential proxies across 195 countries.
When it comes to uninterrupted scraping operations, the more available machines, the less any one machine will be flagged as intrusive by site operators. This both reduces the load on the sites, and increases the likelihood that scraping operations will succeed.
One thing stuck in my mind reviewing this information: How, exactly, does a company like Oxylabs gain access to 175 million machines, especially since they say they do so ethically? Oxylabs provides a must-read report that discusses their procurement processes and policies.
Also: The best proxy server services: Expert recommended
It turns out that the company pays residential machine owners a small amount in return for use of a slice of their bandwidth. This is all facilitated by a number of different apps that offer users financial reward for participating in these programs.
I’ve bumped into the promotion of these apps before, but I didn’t realize their raison d’être: to provide access to distributed machines for data acquisition networks. No individual computer user is going to get rich off of these participation programs, but if you’re someone who uses bandwidth sparingly, it can be a way to pick up a few extra bucks.
In addition to residential proxies, the company offers ISP proxies (which use residential IPs but are hosted in an ISP’s data center for more stability), mobile proxies (which run on and report to sites as mobile devices, for mobile-specific testing), data center proxies (for rock-solid performance, but less anonymity), and dedicated data center proxies (which give you unlimited bandwidth and dedicated IP for high-performance work).
Testing out the coding interface
The folks at Oxylabs gave me access to their coding interface, so I was able to get a feel for what it takes to use their proxies, make data requests, and parse the data for application use.
The company gets kudos for how they provide usage information. They have a very helpful YouTube channel with 425 videos. I only had time to watch a fraction of them, but they are clear, to the point, and very informative.
The company has an easy-to-understand dashboard, which is the starting point for all operations.
They also offer a testing platform, called the API playground. It’s here that you can paste in code segments and see how they perform. Note that the company offers pre-written code blocks for CURL, Python, PHP, C#, Go, Java, Node.js, and JSON. That’s a plus, because many API vendors don’t do this. I always feel more comfortable when I can see code examples in the programming environment I’m using.
Things got really interesting when I started to tinker with the Oxy AI, called OxyCopilot. First, I’m recommending that Oxy change the AI’s name since Copilot is Microsoft’s term and there’s likely to be pushback from Redmond’s trademark enforcement team.
Also: The hidden data crisis threatening your AI transformation plans
That said, OxyCopilot is cool. One of the more challenging aspects of web scraping operations is that once you get the data back, you have to figure out how to extract usable information. Since you’re literally getting back an entire HTML page (filled with ads, HTML tags, and a ton of unrelated information), that post-processing process is algorithmically non-trivial.
On the left is scraping data that Oxylabs pulled back in during a test scrape in their playground. On the right is the product I was scraping, my favorite tech product of all time. The only odd thing is that I gave OxyCopilot the URL to an English-language page and the preview it’s showing is in Spanish, although the pricing information is the same.
Notice how challenging the raw returned data is. But then I did the same operation using OxyCopilot. I started by giving it a URL to scrape.
Then, I skipped past the scraper parameters to give the AI some parsing instructions. All I asked was, “Please extract current product name and price. Indicate if the price is a discounted price or the regular price.”
The result is this interesting form. Note that it did pull the pricing data correctly. It presented the data to me as a JSON block. But the interesting bit is the Parsing Instructions tab at the far right.
What the AI has done is create a JSON structure that you feed into the Oxylabs API when sending a scraping request. The API will follow the instructions embedded in that JSON structure, and give you back just the data you requested.
I’ve done web page parsing many times before, and it is a very time-consuming, tedious task. This took me less that five minutes.
ZDNET’s buying advice
So, should you use this service? Keep in mind that offerings at this level are business and operational decisions. From the point of view of ethical sourcing, Oxylabs–>
seems like a good choice (especially with 50% off with the code OXYLABS50). And, judging from my limited testing, it’s also a good choice from a programming and algorithmic point of view.
As for whether it’s cost-effective, that depends entirely on your use case. Only you and your team can decide that.
Also: How Cisco plans to stop rogue AI agent attacks inside your network
Finally, when it comes to documentation and training materials, Oxylabs is first-rate. I was very impressed with the overall content on their site and on YouTube. It brought me up to speed very quickly.
What about you? Have you used proxy or web scraping services like Oxylabs in your work or research? What challenges have you faced with data collection at scale, and how did you navigate ethical or technical roadblocks? Have you tried integrating AI tools like OxyCopilot to streamline your scraping workflows? Let us know in the comments below.
Featured reviews
<!–>
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
–>