“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on June 24, 2017.

The version of the Instacart data that we will use in this class can be found here.

Context

Instacart is an online grocery service that allows you to shop online from local stores. In New York City, partner stores include Whole Foods, Fairway, and The Food Emporium. Instacart offers same-day delivery, and items that users purchase are delivered within 2 hours.

“The Instacart Online Grocery Shopping Dataset 2017” is an anonymized dataset with over 3 million online grocery orders from more than 200,000 Instacart users. However the dataset does not represent a random sampling of products, users, or purchases. Therefore, while the data allow examination of trends in online grocery purchasing, the results may not be generalizable to Instacart users more broadly.

“The Instacart Online Grocery Shopping Dataset 2017” website provides some summary results of interesting findings, including:

  • Fruits are reordered more frequently than vegetables
  • Soups and baking ingredients are least likely to be reordered
  • Healthier snacks and staples tend to be purchased earlier in the day
  • Ice cream and frozen pizza are the most frequently ordered products late at night

Data description

The original data is quite extensive, and the data linked to at the top of this page for use in the class represents a cleaned and limited version of the data. The dataset contains 1,384,617 observations of 131,209 unique users, where each row in the dataset is a product from an order. There is a single order per user in this dataset.

There are 15 variables in this dataset:

  • order_id: order identifier
  • product_id: product identifier
  • add_to_cart_order: order in which each product was added to cart
  • reordered: 1 if this prodcut has been ordered by this user in the past, 0 otherwise
  • user_id: customer identifier
  • eval_set: which evaluation set this order belongs in (Note that the data for use in this class is exclusively from the “train” eval_set)
  • order_number: the order sequence number for this user (1=first, n=nth)
  • order_dow: the day of the week on which the order was placed
  • order_hour_of_day: the hour of the day on which the order was placed
  • days_since_prior_order: days since the last order, capped at 30, NA if order_number=1
  • product_name: name of the product
  • aisle_id: aisle identifier
  • department_id: department identifier
  • aisle: the name of the aisle
  • department: the name of the department