Last week, Anthropic (which has a $4B investment from Amazon) announced that Claude 3.5 Sonnet now supports data analysis.
TechCruch’s headline for this capability was “Anthropic’s AI can now run and write code.” That is technically correct, but don’t get too excited. Claude isn’t going to sit there and simulate all your code for you. The reality is a lot more limited.
Also: Could AI make data science obsolete?
The feature that Anthropic announced is similar to ChatGPT’s Advanced Data Analysis. One difference is that Claude’s analysis tool is available to everyone, including free users. ChatGPT’s Advanced Data Analysis is only available to Plus and Enterprise account users paying $20 or more a month.
Generating code
Both ChatGPT Plus and Claude perform their data analysis by writing and running snippets of code that parse and process the data. One key difference is that Claude writes its code in JavaScript while ChatGPT writes its code in Python.
Also: The most popular programming languages in 2024 (and what that even means)
These are interesting choices. Python has a rich ecosystem of numerical analysis libraries like Pandas, NumPy, and SciPy. JavaScript also has a rich ecosystem, but its data and AI offerings are not quite as extensive as those for Python. Python is very strong in machine learning and AI, with frameworks like TensorFlow, PyTorch, and Keras. Python also provides excellent support for big data, although, as you’ll see, nothing about Claude’s current analysis tool can be considered even medium data.
JavaScript, by contrast, is ideally suited for data visualization in web pages. The Anthropic solution uses React, but there are also great visualization libraries like D3.js and chart.js available for information presentation. I did find it odd that with such great visualization tools, the pie charts I generated using Claude tended to chop off the data labels for some of the categories.
Also: The best AI for coding in 2024 (and what not to use)
When you ask Claude to process data, it gives you its output but also allows you to look at the underlying code it generates to do that data analysis. Here’s an example.
Usage limits
I decided to use Claude to test out its analysis capabilities. I limited my use to the free version. According to Claude’s FAQ, the $20/mo Pro version increases usage limits by five times.
That’s probably necessary for serious use because after about 20 minutes of testing, I got shut down.
I did try opening a new chat, but it didn’t let me back in. After waiting an hour, I was able to ask some more questions.
Writing code to clean up data
To test Claude’s data analysis capabilities, I went to the data.gov website and downloaded a Social Security Administration dataset on baby name usage derived from social security card applications.
This data came in the form of a ZIP file. I extracted 145 comma separated value (CSV) text files containing baby name data from 1880 to 2023, one file per year.
Also: I pitted Claude 3.5 Sonnet against AI coding tests ChatGPT aced – and it failed creatively
I first tried to select all the files and import them as a group into Claude. I was informed that Claude would only import five files at once.
So, I decided I’d write a script that would create a single file containing all the data. The gotcha was that each individual file didn’t contain the year as one of the fields. So my script would have to add the year from the file’s name to each record in the file, and then do this for all the files.
Rather than coding this myself, I asked Claude to do it for me.
I need to quickly combine 145 text files on a Macintosh. Each file name consists of the letters yob followed by four numbers, indicating the year followed by .txt. The files themselves are comma separated values. For each file, I need to prepend the year contained in the file name, followed by a comma, to every line in its corresponding file. I then need to combine all 145 files into one master file. How can I do this quickly?
It created a shell script that looked like it would do the job.
I saved the script and ran it.
It worked and did exactly as I asked. The result of running that shell script was a 37MB file. Unfortunately, I soon found out that 37MB exceeded Claude’s upload limit of 30MB. I needed a dataset that was considerably smaller.
Rather than using name data from each year, I figured that if I used name data from only one file per decade, I’d reduce my dataset size to 10% of the original size. So I changed my prompt and fed it back to Claude.
I need to quickly combine 145 text files on a Macintosh. Each file name consists of the letters yob followed by four numbers, indicating the year followed by .txt. The files themselves are comma separated values. For each file has a file name ending in 0.txt, prepend the year contained in the file name followed by a comma to every line in its corresponding file. Then need to combine all files ending in 0.txt into one master file. Write a shell script to do that.
That worked just as well as the first prompt, and I was given a 3.9MB file.
Overall, I was quite pleased with today’s Claude 3.5 Sonnet’s coding work. I’ve previously run that LLM through my battery of coding tests without much success. So it was nice to see it run smoothly this time. Unfortunately, that was the last part of today’s testing process that ran smoothly.
More limits in Claude
So let’s look at data analysis in Claude. Unfortunately, Claude appears to be very limited in terms of the amount of data in can ingest. Claude says that its Pro version allows “at least 5x the usage compared to our free service” and that “if your conversations are relatively short, you can expect to send at least 45 messages every 5 hours.”
Also: AI scams have infiltrated the knitting and crochet world – why it matters for everyone
That’s not a lot. And while Claude does say that you can upload five files and 30MB, I found that my combined 3.9MB file was considered a whopping 9124% over its length limit. That file contains 219,181 records.
Okay, fine. So then I tried a file for just one year. The file yob2020.txt is only 561KB and contains just 31,550 records. That file is apparently 1239% over Claude’s length limits.
Doing some math, and assuming you haven’t hit their message usage limits, it looks like Claude limits its data analysis to around 2,000 lines of about 25 characters each.
Let’s compare that to ChatGPT Plus, shall we?
Now, yes, I’m using the free Claude version, but if Claude Pro provides 5X capacity, we can generalize (because the company doesn’t provide hard limits) that Claude Pro would max out at about 10,000 25-character lines.
Also: Anthropic’s latest AI model can use a computer just like you – mistakes and all
By contrast, I fed 69,215 records with an average of 50 characters per line into ChatGPT Plus and it worked just fine. I fed a 22,797 record dataset consisting of sentiment data from users who uninstalled my apps (with most records containing sentiment phrases as well as fixed data) into ChatGPT Plus and it worked just fine. I fed two files consisting of 170,000+ lines of 3D printer G-code into ChatGPT Plus and it worked just fine.
I have found ChatGPT Plus’s data analysis genuinely helpful and productivity-enhancing. But if a pro account was limited to just 10,000 records or less, as Claude Pro seems to do, I probably would have found it an interesting technology demonstration, but not something I could reliably add to my workflow kit bag.
Actually testing Claude’s data analysis
I downloaded about 30 datasets from data.gov before I found one small enough for Claude to examine. That’s a November 2020 dataset of adoptable pets from the Montgomery County Animal Services and Adoption Center in Derwood, Maryland.
This dataset has 85 records of about 190 characters each. Let’s see what it can tell us.
With a prompt of “What can you tell me about this data?” Claude identified the most common pet type (dogs), the most common intake types (owner surrender then strays — that just seems so sad), notable patterns and unique features (Molly is a common name).
I asked for a pie chart representing animal distribution. It gave me this, which showed the main animal types but left “Other” to nearly 50% of the bar graph.
I wanted to know what that “Other” category represented. There’s something a bit poignant about the idea that 30-something percent of the “Other” category consists of tropical fish. I have this depressing vision in my head of row upon row of goldfish bowls, each containing one lone goldfish.
Take a look at that chart and the one just above it. Notice that while there’s plenty of space for the chart to show the labels, they’re cut off in both charts. I know there are 30-something percent of tropical fish, but I don’t know the exact percentage because all that’s shown is a “3”.
JavaScript has excellent charting libraries. I would think Anthropic would have been able to tweak the output to fully represent the chart data, especially in landscape view.
Well, that’s a bummer
I was really hoping that Claude’s data analysis features would be on par with that of ChatGPT Plus. Even if Claude’s free version could only do one-fifth of what ChatGPT Plus could, I might have signed up for a subscription.
I really like the idea of sending my data through multiple analysis tools and comparing the results. That alone would have justified my signing up for another $240/year of AI bill.
Also: Google’s AI podcast tool transforms your text into stunningly lifelike audio – for free
But since its clear from my extrapolations above that the Claude Pro version couldn’t handle even the smallest of the datasets I’ve previously successfully fed into ChatGPT Plus, it certainly doesn’t seem worth the investment.
I’ve reached out to Anthropic for comment but haven’t yet heard back. If the company responds, I’ll update this article with its feedback.
Meanwhile, what do you think? Have you used Advanced Data Analysis in ChatGPT Plus? Are you a Claude or ChatGPT user? When would you or would you ever consider using Claude instead of ChatGPT? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.
–>