Web Scraping of a historic race

Jesse Owens was one of the greatest and most famous athletes in history. He won four gold medals in 1936 Olympic Games in Berlin. I’d like to show how explore the results of the 100 meters race where Owens won one of his medals. We’re going to use the library rvest for web scraping and then ggplot to see the data. We’re going to need library pylr too for join dataframes of a list.


phases <- c("round-one","quarter-finals","semi-finals","final")

df <- data.frame()
for(phase in phases){
olympic <- read_html(paste0("http://www.sports-reference.com/olympics/summer/1936/ATH/mens-100-metres-",phase,".html"))
nodes <- html_nodes(olympic,"#page_content .table_container table")
tables <- html_table(nodes)
df_ <- ldply(tables,function(x) subset(x,select=1:6))
names(df_)[6] <- "time"
df_ <- na.omit(df_)
df_ <- df_[df_$time!="",]
df_$time <- gsub(",",".",df_$time)
df_$phase <- phase
df <- rbind(df_, df)

df$time <- gsub("w","",df$time)
df$phase <- factor(df$phase, levels=phases)

Finally, we plot results of the best six athletes of the last phase.

finalists <- df$Athlete[df$phase=="final"]
dfFinal <- df[df$Athlete %in% finalists,]

ggplot(data=dfFinal, aes(y=Athlete, x=time, col=Athlete, group=Athlete, label=Athlete)) +
geom_label(aes(fill = Athlete), colour = "white", fontface = "bold") +
theme(plot.title = element_text(face="bold", size=14),
axis.ticks.y=element_blank()) +
ggtitle("Berlin 1936")


Although Metcalfe had his worst mark in the round one, he improved signally until a silver medal in the final. Osendarp was very constant in his performance as Wykoff. Nevertheless the most changeful athlete was Strandberg with an oscillation of 4 tenths in the different phases.  As you can see Jesse Owens not only was the best in final race but in all the championship.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s