Advanced Prompt Engineering: Wine Industry

Problem Statement

Prior to meeting with a business owner, I wanted to create a data visualization that communicated potential gains from integrating a data warehouse and leveraging combined datasets. Since I didn't have real data, I needed to generate some quickly. The challenge was to illustrate the power of Data Visualization with a convincingly realistic dataset, on the fly.

Approach

  • Technology
    • ChatGPT Pro
    • Noteable plugin

The Noteable plugin is required for the heavy lifting, available with a ChatGPT Pro license. Noteable is an IDE tailored for Jupyter notebooks.

  • Prompt Framework
    • Initial Prompting: Starting with a basic prompt and observing the AI's response.
    • Refinement: Adjusting the prompt based on the AI's performance to achieve more precision.
    • Iteration: Continuously refining the prompt through multiple iterations, learning from each response to improve the next one.

     

I started by creating a list of the necessary columns needed in the final output, along with creating notes on the predefined properties that the data within each column needed to contain: e.g. realistic wine pricing reflecting vintage values, product names, and a multi-year sales history. This was the jumping off stage where I quickly submitted the prompt to review the results. This allowed me to make early adjustments to my column properties.

To give realistic characteristics to the data, I used a combination of stakeholder interviews, external industry research, and personal industry knowledge.

Multiple iterations and fine-tuning of the prompt took place, in addition to additional instructions to the LLM to make small changes to the final .csv file. The ability to make ongoing updates to the underlying Python that generated the data is a large benefit of using Noteable.

Most Recent Version of the Prompt

-Goal:

Create a realistic dataset for a Direct-to-Consumer (DTC) sales dashboard tailored for a multi-location winery. Utilize the Noteable plugin to execute Python code for data generation and to produce a final .csv file.

Final Output:

  • File Format: .csv
  • File Name: “DTC_Sales.csv”

Process for Communicating Data Specifications:

  1. Column Names and Characteristics:
    • Action: I will provide a list of column names for the "DTC_Sales.csv" file, along with detailed descriptions of the data characteristics for each column.
    • Expectation: The AI will process and understand these specifications.
  2. Iterative Development and Comprehension:
    • After submitting the column names and data traits, I expect ChatGPT to paraphrase the information for confirmation of understanding.
    • If clarification is needed, ChatGPT should ask specific questions.
  3. Review and Adjustments:
    • Once the column names and specifications are agreed upon, ChatGPT will confirm if there are any additional considerations or adjustments needed before proceeding.
    • This step marks the end of the specification phase and the beginning of the configuration process.

CONFIGURATION STEP: The column names and their characteristics

  • Winery This column exists because the parent company owns 4 Wineries that reside in different locations around the world. The names of the wineries and the column values: 
    • Willamette Springs Winery (Pacific Northwest)
    • Chateau Montagne (Argentina)
    • Oak Ridge Estate (Central California)
    • Treebottom Cellars (Napa Valley). 
    • Please don’t include the region name in the winery name as I only include it to demonstrate what region the winery resides in. 
  • Region: Willamette Valley, Argentina, Napa Valley, Central Valley

Before we go further, I need to include a note about Revenue Totals. This document contains sales data for 4 wineries and while revenue isn't a column that exists, the sum of all order totals is going to be equal to the sales revenue for all 4 locations. So I want to provide revenue data, so you know what the individual orders should amount to in order to end with my desired revenue amounts. 

Revenue Totals:

The total annual revenue for all 4 wineries will be 21,548,320. The number of orders and order quantities that are generated in the dataset called “Fact_Sales” should be able to support that amount of sales. 

  • Percentage of Sales by Region:
    • 57% of sales come from Napa Valley
    • 26% from Pacific Northwest
    • 11% of sales from Central California
    • 6% of sales are from Argentina. 
  • Order Date: Show daily sales volume for the years 2018 - 2023. The least number of orders should be on a Monday. The most number of orders should be on weekends. 
  • Order number: Random generate a string that begins with the letters PO, followed by 5 numbers that increase by 1 for each new chronological order that is placed. 
  • Product SKU: Autogenerate a sku number that starts with R for red wine and W for white wines and M for merchandise
  • Product Name: The following is a list of the total products that exist in 3 categories, red wine, white wine, and merchandise. Red wine should comprise 61% of sales in Napa, Argentina, and the Central Valley. Red wine should comprise 55% of sales in the Willamette Valley. Merchandise should only account for 8% of total sales revenue. 
    • 7 red wines
      • Cabernet Sauvignon
      • Syrah
      • Zinfandel
      • Merlot
      • Pinot Noir
      • Brunello
      • Malbec 
    • 4 white wines
      • Chardonnay
      • Pinot Gris
      • Rose
      • Sauvignon Blanc 
    • Merchandise: 
      • Wine Glasses: $28 per glass, usually purchased in a set of 4. 
      • Wine Decanters: $89
      • Wine Bottle Openers: $38
      • Wine Stoppers: $12
      • Wine Aerators: $30
      • Wine Chillers: $50
      • Wine Totes or Carriers: $32
      • Wine Racks or Holders: $55
      • Wine Books or Guides: $28
      • Wine Tasting Journals: $14
      • Wine Corkscrews: $14
      • Wine Coasters: $20
      • Wine Charms: $12
  • Category (merchandise, red wine, white wine)
  • Product Price: I’ve already provided prices for merchandise but need assistance to generate a bottle price using the average bottle sale price in the napa valley. 
  • The average bottle price should be 108.00. 
  • The bottle price for the cheapest bottle should be 39.00. 
  • The bottle price of the most expensive bottle should be 499.00. 
  • The wines priced about 199.99 should be only 22% of total sales volume. 
    • Quantity Sold: Each order should represent average sales of 4 bottles of wine per order Merchandise will be included in a small percentage of transactions (in order to reach the 8% of sales statistic mentioned earlier) Merchandise sales should be contained in the tasting room category. 
    • Purchase Total: 
      • Average total purchase price of $432. 
  • Product Cost: This number can be calculated by taking the Retail Price and reverse engineering the product cost with the idea that the retail price is 225-325% markup of the product cost.
  • Vintage: Include vintages from the years 2016 through 2022. 
  • DTC Category: This column categorizes the direct to consumer revenue breakdown. The four values and the percentage of revenue each category should represent in sales are:
    • Tasting Room: 40% of revenue
    • Online Sales: 30%
    • Wine Club Memberships: 20%
    • All Others: 10%
  • Additional Considerations:
    • If the vintage year is older, the bottle price should be more expensive 
    • Most of the sales should come from the 2018 vintage making up 38% of sales, the single highest vintage year.
    • Vintage 2016 should be the most expensive Product Price for average sales of all vintages sold. Account for this by calculating the product price column for vintages 2018-2022 first. When generating data for the 2016 vintage, increase the Product Price by an increase of 18% to the average Product Price for vintages made between 2018-2022. 
    • Vintage 2017 should be the cheapest vintage for Product Price and also have a higher cost. Account for this by calculating Product Price for product cost for vintages 2016-2022 first. Then add an average increase of 18%  to the product cost for 2017 vintage. 
    • As AI considers how to generate the numbers, for which I’ve provided parameters, please consider that in each column, please avoid clustering of numbers or a pattern observed in many datasets, where instead of a broad, uniform distribution, the data points tend to aggregate in specific areas or clusters. Please provide a more broad distribution of numbers.
    • One last note: I'm not interested in rapid processing time. So if there are avenues of inquiry you can investigate or domains of knowledge you can deep dive for, please do that, even if it takes much longer for you to provide an answer. Don't artificially inflate your processing time but feel comfortable operating closer to any processing limits imposed by ChatGPT.
    • All that said, please stop, take a deep breath and work on this problem step-by-step.

Final report built using AI generated dataset

image

Before and After: Prompt Refinement

INITIAL PROMPT:

Create a sample dataset and output in .csv format that meets the following criteria:

  • Company is a winery with 3 locations.
  • Revenue: $15,000,000 per year , broken up by the following sales channels.
  • Sales channels: Direct to consumer (tasting room, online sales, wine club memberships) and wholesale and distribution (Kroger, Safeway, Whole Foods, Albertson's)
  • Products: 7 red wines including Cabernet Sauvignon, Syrah, Zinfandel, Merlot, Pinot Noir, Brunello, Malbec and 4 white wines including Chardonay, Pinot Gris, Rose, Sauvignon Blanc
  • Dates: show daily sales volume for the years 2021 - 2023

NOTE: I ran the prompt at this point to get an idea of what I was working with. This allowed me to quickly see necessary enhancements and to ensure a successful connection between ChatGPT and Noteable. I ran prompts at multiple times, in identical fashion, as I progressed and refined.

Description