Show HN: Sourcetable – AI Spreadsheet and Data Platform
188 points by mceoin 2 months ago | 104 comments
Hi HN! I’m Eoin, founder of Sourcetable (https://sourcetable.com).
Sourcetable is an AI-native spreadsheet that syncs with all your data. Users pair with an AI copilot that helps them do their spreadsheet work, as well as more database-centric analysis and SQL.
Soucetable syncs with databases including Postgres, MySQL, and MongoDB, and over 100+ business applications including Stripe, Zendesk, Hubspot, Quickbooks and Google Analytics. That data is available in a spreadsheet, and any models you build automatically update in near-real-time as new data flows in. The core primitives are AI + spreadsheet + data sync + storage + compute.
If you want to play with Sourcetable today, the easiest way is to upload a CSV and start asking questions.
Who is it for? Sourcetable is for analysts, operators and finance folk doing data-centric work in a spreadsheet. Sourcetable’s spreadsheet-based AI assistant understands workbook range selection and can adjust scope context to the datasets you are working with. You can talk directly to your database and SaaS integrations, which is great for analysis, data search and retrieval, SQL writing & editing (including writing joins across different datasets), and automatic chart creation.
Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.
Spreadsheets are the most used analytical tool on the planet. AI is a platform shift with broad applications. We are staying open-minded about users and use cases since everything is so new.
Backstory: I spent ten years working in de-facto operations and technical roles at startups. Sourcetable draws from that experience of needing better data tooling inside spreadsheets, and constantly hacking ad hoc solutions to fill the gap. Andrew (CTO / co-founder) previously had a deep learning company and was initially drawn to the idea that Sourcetable could be an operating system for the web. We’re both Aussie expats in the Bay Area, which is how we met. Internally, we think of Sourcetable as an application platform, with AI applications being a useful and interesting place to focus.
Features & Use Cases: Talk to your CSV files, spreadsheets, integrations, and datasets using LLMs. AI + data work: Text-to-SQL, search and retrieval from databases, LLM-based data analysis. (This is an entirely different experience to what Copilot/Gemini & Excel/Sheets provide, since they are thin workbooks and not data platforms.) AI + spreadsheet work: formula assist, workbook analysis, data cleaning, chart creation, error handling, summarization, chat, etc. Automated reporting: data is synced, reports you build stay up to date. No-code data access: give the business team safe database access so they will leave you alone! Centralizing data for cross-channel reporting. (e.g. Postgres + Stripe + Mailchimp) Analyzing large CSV files: Sourcetable can handle multi-gigabit files. (Google Sheets can’t handle large data and the experience in Excel is rather cumbersome.)
Technical Details: Sourcetable was built to be fast. It was also built to scale.
AI: LLama 3 (via Groq), Claude, GPT-4o, LiteLLM, custom LLMs
Frontend: DuckDB, React, ShadCN, AntV / Bizcharts, Plotly, CodeMirror, Hookstate
Backend: DuckDB, Python, Cassandra, Redis, NGINX, Cloudflare
Data Eng & Transformations: Fivetran, DBT, Apache Arrow, SQLglot
Distributed Computing & Scaling: Daft, Ray, Cloud Formation
Other: Linux Namespaces, Dill (U.Queensland)
A huge thank you to the open source community, and a special shout-out to DuckDB for being so damn fast. Thank you also to Groq & Anthropic for the rate limit increases in time for this ShowHN post!
–
Feedback: Product feedback is welcome! eoin@sourcetable.com
primitivesuave 2 months ago | next |
This is incredible. I uploaded a CSV with ~6000 rows containing campaign finance data for a particularly corrupt local politician and asked "what was the total contributed amount in [year]". Not only did it produce the correct answer (in around the same amount of time it took me to calculate it on my end) but it also seemed to understand that the spreadsheet was related to campaign finance in the "summary" portion of the response.
The most useful aspect was that I could ask "what was the total contributed amount between January and June of 2020" and get an accurate answer for that as well. Since the date column is provided as an "MM/DD/YYYY" string, I would normally have to do some boilerplate work to sanitize this.
For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output. But overall I was truly blown away that something like this is even possible for a small team to build.