Babak Shahian Jahromi My Data Science Blog:
    About     Archive     Feed

Prediction of Marathon Time

image-title-here

Introduction

The main goal of this project is prediction of marathon race time in male and female runners.

Data Source

The data is sourced from the website of six world major marathons namely: Tokyo, Boston, London, Berlin, Chicago, NYC. The data is scraped from these websites using web scraping technologies like Beautiful Soup and Selenium

alt_text

Feature Selection

The more relevant features picked for predicting marathon times are athlete’s age, percentage of body fat, average running speed, weather, terrain, and running footwear. alt_text

Modeling

I used two Ordinary Least Square models for male and female athletes. I also tried Lasso and Ridge regression with cross validation and tuning. Lasso regression with alpha=0.0001 was my best model. The adjusted R-squared of the male model was 0.44 and for the female model was 0.71. The images below show the final models for both genders:

Male race time (min) = 285.4

  • 2.316 × Body Fat (%) − 10.446 × Running Speed (kph) + 1.38 × Age − 1.02 × Running Shoe

Female race time (min) = 337.4

  • 2.154 × Body Fat (%) − 15.112 × Running Speed (kph) + 1.27 × Age − 1.04 × Running Shoe

alt_text

Contributing

Please feel free to submit pull requests for development. The following instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You need the following software for testing and development:

  • Python (version 3)
  • Command line interface
  • Web browser (Microsoft Edge, Firefox, Chrome and Safari supported)
  • Source code editor like Atom or Sublime Text
  • Git source control manager

Cloning Repository

Start by opening the command line and downloading the repository as follows

git clone https://github.com/BabakShah/...

change the directory to the project folder and the desired source file (Python, HTML, CSS, JS)

cd ./DS-Saf

in the command line, download all the python library dependencies

pip install -r /path/to/requirements.txt

for further development, open the source files in a source code editor. For Python scripts

open -a "Sublime Text" file-name

for IPython notebooks

jupyter notebook file-name

Built With

  • Python alt_text
  • Git git_logo
  • Command Line Interface cli_logo

Libraries used

  • Scikit-learnsklearn_logo
  • Matplotlib matplotlib_logo
  • Numpy numpy_logo
  • Pandas pandas_logo
  • Jupyter Notebook jupyter_logo

References

[1] A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Pianesi and A. Pentland, ‘Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data’, CoRR, vol. 14092983, 2014.

Contact me

Babak - email