The Optimal Portland Pub Crawl
Overview
The premise of a Pub Crawl is quite simple: visit several bars in an afternoon or evening without a clear plan of where you’ll go next. While this sort of spontaneous, unstructured approach may work for some people, I’ve always been a fan of having a plan – in this case, an optimal plan. If we want to maximize the number of places visited (and beers tasted) in a finite period of time, then there is simply no room for shoddy planning. Accordingly, this post provides a framework for designing the optimal Portland Pub Crawl by working through the following steps:
🍺 Web Scrape the top 100 Portland bars from here
🍺 Geocode each bar’s location
🍺 Find the optimal route between a subsample of the bars, because visiting 100 in a day would make the following day very bad
🍺 Determine a walking path between the bars
🍺 Create a map of the walking path, which can be use as a field guide to impress your friends once the pub crawl is under way
🍺 Promptly spill beer on the map at the 2nd bar, rendering it unreadable, followed by a game of darts and some popcorn
If that sounds like a plan, let’s get started!
Defining the Top 100 Bars
First, let’s identify the stops during our tour of Portland’s pub scene. In the section below, we’ll load up the R libraries, identify which version of Python we’d like to use, and then do some web scraping. Note that all of the python modules and R-scripts are contained in the same directory for simplicity.
# Core package
library(tidyverse)
# Mapping
library(leaflet)
library(widgetframe)
library(leaflet.extras)
# Calling python functions from R
library(reticulate)
# Route optimization
library(tspmeta)
# Making nice tables
library(gt)
The geocode_best_portland_bars
function below is responsible for collecting the information.
# best_bars.py
import os
import urllib
from bs4 import BeautifulSoup
import re
from typing import List
import googlemaps
from tqdm import tqdm
import pandas as pd
from dotenv import load_dotenv
API_KEY = os.getenv('GOOGLE_API_KEY')
def find_best_bars() -> str:
base_url = "http://www.oregonlive.com/dining/index.ssf/2014/10/portlands_100_best_bars_bar_ta.html"
page = urllib.request.urlopen(base_url)
soup = BeautifulSoup(page, "html.parser")
bar_descriptors = soup.find_all("div", class_="entry-content")
bar_descriptors = str(bar_descriptors).split("<p>")[0]
best_bars_raw_lst = re.findall(r"\<strong>(.*?)</strong>", bar_descriptors)
return best_bars_raw_lst
def clean_bar_names(raw_bar_lst: str) -> List[str]:
# exclude emphasis tags
best_bars = [re.sub(r"<em> (.*?)</em>", "", x) for x in raw_bar_lst]
# exclude number included in bar name
best_bars = [re.sub(r"No. \d+ --", "", x).strip() for x in best_bars]
# exclude headers in all caps
best_bars = [x for x in best_bars if not x.isupper()]
# exclude all lower case tags
best_bars = [x for x in best_bars if not x.islower()]
# exclude bold tags in html
best_bars = [x.replace("&", "&") for x in best_bars]
# exclude other emphasis tags
best_bars = [re.sub(r": <em>(.*?)</em>", "", x) for x in best_bars]
# strip colons
best_bars = [x.replace(":", "") for x in best_bars]
# exclude blanks
best_bars = [x for x in best_bars if x]
return best_bars
def geocode_best_portland_bars() -> pd.DataFrame:
best_bars_lst = find_best_bars()
bar_names = clean_bar_names(raw_bar_lst=best_bars_lst)
bar_names = [f"{x}, Portland, OR" for x in bar_names]
gmaps = googlemaps.Client(key=API_KEY)
geocoded_bars_lst = []
for name in tqdm(bar_names):
geocode_result = gmaps.geocode(name)
lat_lng = geocode_result[0].get("geometry").get("location")
lat, lng = lat_lng.get("lat"), lat_lng.get("lng")
geocoded_bars_lst.append([name, lat, lng])
geocoded_bars_df = pd.DataFrame(geocoded_bars_lst)
geocoded_bars_df.columns = ["name", "lat", "lng"]
return geocoded_bars_df
Historically, this operation would require executing a python script, writing the results out in a .txt or .csv file, and then reading the output back into R. However, with the advent of reticulate
, we can execute a python function and pull the output back without ever having to leave the cozy confines of R (or R-studio in this case). Recall that the actual python best_bars.py
module, which contains the three functions defined above, is located in the same directory as our R-script. We’ll source" this function via source_python
(which means to bring it into the R environment), then we’ll execute it. Note that we aren’t passing in any arguments; this function is designed for a particular purpose, which is to find the best watering holes in Portland.
# specify which version of Python to use
reticulate::use_python('//anaconda/bin/python', required = TRUE)
# brings our function into the R Environment
reticulate::source_python('best_bars.py')
# executes and stores the output in our variable 'best_bars'
best_bars = geocode_best_portland_bars()
best_bars %>%
tail(10) %>%
mutate(name = str_replace(name, ', Portland, OR', '')) %>%
gt() %>%
tab_header(title = gt::md('**Data Sample of Best Portland Bars**')) %>%
cols_align(
align = "center",
columns = everything()) %>%
cols_width(
everything() ~ px(155)
)
Data Sample of Best Portland Bars | ||
---|---|---|
name | lat | lng |
Swine | 45.51827 | -122.6818 |
Jackknife | 45.52051 | -122.6828 |
Stammtisch | 45.52586 | -122.6375 |
Cooper's Hall | 45.51988 | -122.6596 |
The Knock Back | 45.55923 | -122.6416 |
Multnomah Whiskey Library | 45.52098 | -122.6835 |
Trifecta Tavern & Bakery | 45.51756 | -122.6596 |
Angel Face | 45.52321 | -122.6371 |
Pepe Le Moko | 45.52187 | -122.6813 |
Expatriate | 45.56240 | -122.6346 |
We have successfully scraped the best bars and geocoded their locations. Note that you’ll need access to the Geocoding API through the Google Maps Platform to follow along with the example above. In the following section, we’ll solve a classic routing optimization problem: The Traveling Salesman Problem (TSP). The goal is to find the most direct route between all of the bars we want to visit during the pub crawl.
Route Optimization
The goal of any routing optimization problem is simple: minimize the total distance traveled between different nodes (locations) in space while ensuring that each node is visited once. There are many algorithms to solve this type of problem, but we’ll leverage the 2-optimization
or 2-opt
method due to its simplicity. This algorithm finds the lowest cost route (i.e., the route with the shortest distance that ensures each node is visited once) by swapping the ‘edges’ (the path that connects two nodes) between different nodes. If a swap reduces the total length of our tour, then the swap is maintained; otherwise, the swap is reversed, and we try again with different edges. Note that the swap must ensure that a single route is always possible between all nodes. The algorithm stops when a tour is reached that cannot be improved with more swaps (see here for a more in-depth explanation).
Before going any further, let’s plot out our bar locations. We’ll also define our starting point, which is referred to as the “depot.”
depot_lat = 45.525915
depot_lng = -122.684957
bar_map = leaflet(data = best_bars) %>%
setView(lng = depot_lng + 0.05, lat = depot_lat, zoom = 13) %>%
addProviderTiles("CartoDB.Positron") %>%
addMarkers(lng=depot_lng, lat=depot_lat) %>%
addCircleMarkers(lat=~lat,
lng=~lng,
color = "orange",
radius = 4,
weight = 10,
stroke = FALSE,
opacity = 4,
fillOpacity = 4
)
Each orange dot is a bar, and the pointer indicates our starting position (the depot). Given that we are walking let’s limit the potential distance to a maximum of three miles from our starting location. The function below calculates the total feet between two points defined by a latitude/longitude coordinate.
earth_dist = function (lat1, lng1, lat2, lng2)
{
rad = pi/180
a1 = lat1 * rad
a2 = lng1 * rad
b1 = lat2 * rad
b2 = lng2 * rad
dlon = b2 - a2
dlat = b1 - a1
a = (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
R = 6378.145
d = R * c
return(d* 3280.8)
}
Below we’ll filter to all locations based on the maximum distance we’re willing to travel.
feet_in_mile = 5280
# maximum distance is 3 miles
max_miles_away = 3
bar_locations_nearby = best_bars %>%
mutate(distance_from_depot = earth_dist(depot_lat,
depot_lng,
lat,
lng
)
) %>%
filter(distance_from_depot <= feet_in_mile * max_miles_away)
set.seed(1)
# we'll visit 24 bars
n_bars = 24
# randomly select 24 bars to visit
bar_locations_nearby = bar_locations_nearby %>%
sample_n(n_bars)
Next, we’ll transform the lat/long locations into a distance matrix. The distance matrix specifies the euclidean distance of each bar from every other bar.
# now find optimal route
coordinates = bar_locations_nearby %>%
dplyr::select(lat, lng, name) %>%
mutate(location_index = 2:(n() + 1)) %>%
bind_rows(data.frame(lat = depot_lat,
lng = depot_lng,
address = 'depot',
name = 'depot',
location_index = 1
)
) %>%
arrange(location_index)
coords_matrix = coordinates %>%
dplyr::select(lat, lng) %>%
as.matrix()
dist_matrix = dist(coords_matrix)
The two functions below tsp_instance
and run_solver
do the heavy lifting and find the optimal route between bars.
# create tsp instance
tsp_ins = tspmeta::tsp_instance(coords_matrix,dist_matrix)
# find optimal route based on 2-opt method
opt_tour = as.integer(run_solver(tsp_ins, method="2-opt"))
# sort to start at depot
sorted_tour = c(opt_tour[which(opt_tour == 1):length(opt_tour)],
opt_tour[1:(which(opt_tour == 1) - 1)]
)
# join route order back to original data
coordinates = coordinates %>%
dplyr::inner_join(data.frame(location_index = sorted_tour,
route_order = 1:length(sorted_tour)
)
) %>%
dplyr::arrange(route_order)
# reformat so each row has a starting lat/lng and ending lat/lng
route_df = coordinates %>%
dplyr::select(-address) %>%
dplyr::rename(start_lat = lat,
start_lng = lng
) %>%
dplyr::mutate(end_lat = c(start_lat[2:n()], NA),
end_lng = c(start_lng[2:n()], NA)
) %>%
na.omit()
Let’s take a peak at our data to see how everything turned out.
Route | |||||
---|---|---|---|---|---|
name | route_order | start_lat | start_lng | end_lat | end_lng |
depot | 1 | 45.52591 | -122.6850 | 45.52627 | -122.6784 |
Park Kitchen | 2 | 45.52627 | -122.6784 | 45.52531 | -122.6783 |
Remedy Wine Bar | 3 | 45.52531 | -122.6783 | 45.52201 | -122.6816 |
Clyde Common | 4 | 45.52201 | -122.6816 | 45.52246 | -122.6852 |
Cassidy's | 5 | 45.52246 | -122.6852 | 45.52098 | -122.6835 |
Multnomah Whiskey Library | 6 | 45.52098 | -122.6835 | 45.51487 | -122.6824 |
The Rookery at Raven & Rose | 7 | 45.51487 | -122.6824 | 45.51400 | -122.6753 |
Veritable Quandary | 8 | 45.51400 | -122.6753 | 45.51904 | -122.6781 |
Departure | 9 | 45.51904 | -122.6781 | 45.52403 | -122.6756 |
Ground Kontrol | 10 | 45.52403 | -122.6756 | 45.51900 | -122.6641 |
Sweet! Almost there. The final step is to convert these points into an actual travel path.
Creating a Walking Path
Currently, the path between different nodes (i.e., bars) are straight lines. We’ll be walking this tour, so a sidewalk travel path is required. We’ll call on the Google Maps API one last time to convert each of the straight-line edges to actual walking paths via the convert_route_to_path.py
module. This module consists of two functions: find_path
and extract_polyline
. find_path
takes a starting lat/long, ending lat/long, and method of travel (walking in our case) and returns step-by-step lat/long coordinates along with distance and time estimates. extract_polyline
is a helper function that will format each of the step-by-step coordinates into pandas DataFrame. The output is returned as an R DataFrame. We’ll specify the python module below.
# convert_route_to_path.py
import os
import pandas as pd
import polyline
import googlemaps
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('GOOGLE_API_KEY')
def extract_polyline(coords: dict) -> pd.DataFrame:
gmaps_polyline = coords["overview_polyline"]["points"]
polyline_df = pd.DataFrame(polyline.decode(gmaps_polyline))
polyline_df.columns = ["lat", "lng"]
polyline_df["path_order"] = range(1, polyline_df.shape[0] + 1)
return polyline_df
def create_travel_path(
route_df: pd.DataFrame, travel_mode: str = "walking"
) -> pd.DataFrame:
gmaps = googlemaps.Client(key=API_KEY)
out_route_df = pd.DataFrame()
for row in route_df.itertuples():
coords = gmaps.directions(
origin=[row.start_lat, row.start_lng],
destination=[row.end_lat, row.end_lng],
mode=travel_mode,
)
coords_df = extract_polyline(coords=coords[0])
coords_df["location_index"] = row.location_index
coords_df["travel_time"] = coords[0]["legs"][0]["duration"]["value"]
coords_df["miles"] = coords[0]["legs"][0]["distance"]["text"]
coords_df["route_order"] = row.route_order
out_route_df = out_route_df.append(coords_df)
out_route_df = out_route_df.reset_index(drop=True)
return out_route_df
Next, we’ll read the convert_route_to_path.py
module into R and pass in our route DataFrame to the create_travel_path
function.
reticulate::source_python('convert_route_to_path.py')
path_df = create_travel_path(route_df)
The data indicating the path between our depot and the first bar should look like this:
Sample Travel Path | ||||||
---|---|---|---|---|---|---|
location_index | route_order | travel_time | miles | path_order | lat | lng |
1 | 1 | 344 | 0.3 mi | 1 | 45.52576 | -122.6849 |
1 | 1 | 344 | 0.3 mi | 2 | 45.52575 | -122.6853 |
1 | 1 | 344 | 0.3 mi | 3 | 45.52508 | -122.6853 |
1 | 1 | 344 | 0.3 mi | 4 | 45.52449 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 5 | 45.52443 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 6 | 45.52362 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 7 | 45.52312 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 8 | 45.52304 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 9 | 45.52282 | -122.6852 |
1 | 1 | 344 | 0.3 mi | 10 | 45.52263 | -122.6853 |
1 | 1 | 344 | 0.3 mi | 11 | 45.52247 | -122.6854 |
1 | 1 | 344 | 0.3 mi | 12 | 45.52243 | -122.6852 |
Note the small changes between each of the successive lat/long coordinates. This is the path we’ll be walking to obtain our first frosty mug of beer. Before mapping our data, let’s get a general idea of total walking time and distance.
travel_time_in_hours = round(path_df %>%
dplyr::select(location_index, travel_time) %>%
dplyr::distinct() %>%
dplyr::pull(travel_time) %>%
sum() / 3600, 1)
print(glue::glue("Total Travel Time Is: ",
travel_time_in_hours,
" Hours"
)
)
## Total Travel Time Is: 6.6 Hours
It looks like this walk will take around six hours, so we’ll need to bring some comfy shoes. What about distance (we’ll need some way to work off those calories)?
travel_distance_in_miles = round(path_df %>%
dplyr::mutate(feet_numeric =
case_when(stringr::str_detect(miles, 'ft') == TRUE ~
as.numeric(stringr::str_replace(miles,
" ft",
""
)
),
stringr::str_detect(miles, " mi") == TRUE ~
as.numeric(stringr::str_replace(miles,
" mi",
"")
) * feet_in_mile
)
) %>%
dplyr::select(location_index, feet_numeric) %>%
dplyr::distinct() %>%
dplyr::pull(feet_numeric) %>%
sum() / feet_in_mile, 1)
print(glue::glue("Total Travel Distance Is: ",
travel_distance_in_miles,
" Miles"
)
)
## Total Travel Distance Is: 19 Miles
OK, this is more of a Pub Crawl marathon. That’s some serious distance to cover. Let’s bring it all together with some visualization.
Mapping the Route
The last step is to bring this analysis to life with everyone’s favorite visualization: MAPS! Indeed, we’ll plot the walking path across downtown Portland to see the Pub Crawl route.
# We'll use this to identify the labels for each stop
label_df = path_df %>%
dplyr::filter(path_order == 1)
# Bar crawl visualization
final_route = leaflet(data = path_df) %>%
setView(lng = depot_lng + 0.04, lat = depot_lat + 0.01, zoom = 13) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolylines(data = path_df %>%
filter(route_order < 24),
lng = ~lng,
lat = ~lat,
color = "orange",
opacity = 4
) %>%
addMarkers(lng = depot_lng,
lat = depot_lat
) %>%
addCircleMarkers(data = label_df,
lng = ~lng,
lat = ~lat,
radius = 4,
label = ~as.character(route_order),
labelOptions = labelOptions(noHide = T,
textOnly = T,
direction = 'top',
textsize = "14px",
offset=c(0,-5),
size = 1
)
)
Whew! That was a lot of work, but it looks like we have a reasonable solution. We’ll start in downtown Portland, take a quick tour over to the city’s Eastside, and eventually return to the Northwest. So, there you have it – a real-world application of optimization that supports an efficient Pub Crawl. Prost!