# How to use the dataset and the modeling code

## Requirements:

 - python3
 - sklearn python package (http://scikit-learn.org/)

## How to run (in Linux):

$ ./socc.py Linear

Executing the command will reproduce the linear model result in Table 2 of the SoCC'17 paper.
To see the result of other modeling techniques, pass "SVR" or "Bagging" instead of "Linear" to "socc.py"

## About the 512 migrations data

`dataset` folder also includes `2017.socc.dataset.512.migrations.csv` in addition to the original 40,000 dataset.

The only difference in the 512 migration dataset is an additional column `workload_seed` which represents the unique seed value (total 512) for the migration workload.
Each unique workload is migrated with the five migration techniques in the paper, so we can fairly compare the different migration techniques using this dataset.
This dataset also used to write Section 8 of the SoCC'17 paper.

Finally we provide `2017.socc.dataset.exclude.512.csv` which is built by excluding the 512 migrations from the full dataset.
This dataset was used to train the model for the guided migration experiment in the paper.

# Citing the dataset and the model

Please include the follwing paper in your paper if you want to cite our work.

@conference{Jo:2017:SoCC,
  author = {Jo, Changyeon and Cho, Youngsu and Egger, Bernhard},
  title = {A Machine Learning Approach to Live Migration Modeling},
  booktitle = {ACM Symposium on Cloud Computing},
  series = {SoCC'17},
  year = {2017},
  month = {September},
  location = {Santa Clara, CA, USA},
}

# Contact
homepage: https://csap.snu.ac.kr/software/lmdataset
e-mail: changyeon@csap.snu.ac.kr, bernhard@csap.snu.ac.kr
