Web Scraping of a historic race

Jesse Owens was one of the greatest and most famous athletes in history. He won four gold medals in 1936 Olympic Games in Berlin. I’d like to show how explore the results of the 100 meters race where Owens won one of his medals. We’re going to use the library rvest for web scraping and then ggplot to see the data. We’re going to need library pylr too for join dataframes of a list.


phases <- c("round-one","quarter-finals","semi-finals","final")

df <- data.frame()
for(phase in phases){
olympic <- read_html(paste0("http://www.sports-reference.com/olympics/summer/1936/ATH/mens-100-metres-",phase,".html"))
nodes <- html_nodes(olympic,"#page_content .table_container table")
tables <- html_table(nodes)
df_ <- ldply(tables,function(x) subset(x,select=1:6))
names(df_)[6] <- "time"
df_ <- na.omit(df_)
df_ <- df_[df_$time!="",]
df_$time <- gsub(",",".",df_$time)
df_$phase <- phase
df <- rbind(df_, df)

df$time <- gsub("w","",df$time)
df$phase <- factor(df$phase, levels=phases)

Finally, we plot results of the best six athletes of the last phase.

finalists <- df$Athlete[df$phase=="final"]
dfFinal <- df[df$Athlete %in% finalists,]

ggplot(data=dfFinal, aes(y=Athlete, x=time, col=Athlete, group=Athlete, label=Athlete)) +
geom_label(aes(fill = Athlete), colour = "white", fontface = "bold") +
theme(plot.title = element_text(face="bold", size=14),
axis.ticks.y=element_blank()) +
ggtitle("Berlin 1936")


Although Metcalfe had his worst mark in the round one, he improved signally until a silver medal in the final. Osendarp was very constant in his performance as Wykoff. Nevertheless the most changeful athlete was Strandberg with an oscillation of 4 tenths in the different phases.  As you can see Jesse Owens not only was the best in final race but in all the championship.