Cafe Imports is arguably one of the biggest and most popular coffee importers in the world. They have a comprehensive database for their stocks of green coffee that they offer across three continental geographies: Europe, North America, and Australia.
This project aims to provide historical analysis of the Cafe Imports coffee supplies, in order to understand the trends of the market across a range of coffees available to roasters that are customers of Cafe Imports. This data can be accessed from the pricing sheets that are made available, as they provide the historical snapshot needed to build the dataset.
You can find the application here.
Approximately twice a month, price sheets are sent out via email, which contain a link to a formatted spreadsheet in PDF format (exported from Microsoft Excel), hosted on their servers.
The Europe pricing sheets contain the following information about Cafe Imports’ green coffees (also shown are the mapped database column names and an example from the sheet):
Pricing Sheet Column Name (database_columm_name) – Example
- Origin (
origin) – Brazil - Grade (
grade) – Carmo de Minas - Name (
name) – Fazenda Sertao – Yellow Bourbon (SC Bags) - Process (
process) – Natural - ID (
id) – 24704 - Approx EUR/kg (
price_eur) – € 9.44 - Approx GBP/kg (
price_usd) – £8.16 - Bags Available (
bags_available) – 3 - Bag Weight (
bag_weight_raw) – 59 Kg - Cupping Notes (
cupping_notes) – Mellow peanut butter and fruit flavors with mild acidity and sweetness.
This is almost the same information that can be found on the offerings page of the Cafe Imports website, with the exception of the pricing data for the coffee (unless the user is logged in, which requires an application that must be approved from Cafe Imports).
Additionally, they do not provide historical pricing data for offerings in the “Archive” view, such as previous prices and stock levels. For spot purchasing, this is sufficient, but deeper analytics are not possible without the historical data.
How the data is collected
- Cafe Imports has all their PDFs available in sequential order via their CDN (content delivery network). These links are discoverable via their newsletter.
- The naming convention is generally https://cdn.cafeimports.com/images/Cafe-Imports-EUR-Spot-1.pdf, where
-1is the index of the pricing sheet. (However, the first sheet available has no numerical index, instead omitting the"-{index}". I.e./images/Cafe-Imports-EUR-Spot.pdf). - A script is used to assess each link sequentially for a valid PDF. The CDN works as such that when an erroneous link is attempted, it will try to correct the resource lookup with another file that is similar in filename, or provide a list of documents with similar filenames. When the request redirects away from the expected filename, it is assumed that the indexing has ended and there are no new PDFs to obtain.
- Each PDF is sent to an LLM (
gemini-3.1-flash-lite) in order to perform the data extraction. The data is loaded into an SQLite database. Thereport_dateis taken from the PDF header. For example, this pricing sheet will be entered into the database with areport_dateof 2026-03-24, as that is the date found at the top of the pricing sheet.
Next steps
- Tidy up the interface to make it more user-friendly.
- Automate the data pipeline.
- Deeper analysis to provide buying recommendations as well as identifying relationship trends between certain farms/producers, and Cafe Imports.
Feedback
Feedback is always welcome! hello@greenestbean.com