These data were accessed from Inside Airbnb on September 2, 2017. The version of the data that we will use in this class can be found here.
Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world.
By analyzing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so you can see how Airbnb is being used to compete with the residential housing market.
Inside Airbnb provides some visualizations of the NYC Airbnb data here, where you can see maps showing type of room, activity, availability, and listings per host for all NYC Airbnb listings.
After downloading both the “listings.csv.gz” and “listings.csv” files, the following code was used to create the provided dataset:
library(tidyverse)
# Uncompress and load the detailed NYC airbnb data
# But only keep the 2 variables of interest
zz <- gzfile("listings.csv.gz", 'rt')
airbnb_location_df =
read_csv(zz, header = TRUE) %>%
select(id, review_scores_location)
# Read in the summary data for NYC
airbnb_listings_df =
read_csv("listings.csv") %>%
mutate(last_review = as.Date(last_review, format = "%Y-%m-%d"))
# Combine the two datasets
nyc_airbnb = inner_join(airbnb_location_df, airbnb_listings_df, by = "id")
# Save the data
save(nyc_airbnb, file = "nyc_airbnb.RData")
The resulting R data file nyc_airbnb
contains a single
dataframe nyc_airbnb
with 40,753 rows of data on 17
variables:
id
: listing idreview_scores_location
: 0-5 stars converted into a 0-10
scalename
: listing namehost_id
: host idhost_name
: host nameneighbourhood_group
: NYC boroughneighbourhood
: NYC neighborhoodlatitude
: listing latitudelongitude
: listing longituderoom_type
: type of listing (Entire home/apt, Private
room, Shared room)price
: listing priceminimum_nights
: required minimum nights staynumber_of_reviews
: total number of reviewslast_review
: date of last reviewreviews per month
: average number of reviews per
monthcalculated_host_listings_count
: total number of
listings for this hostavailability_365
: number of days listing is available
out of 365