opendoor Scientist challenge

  1. use dataset provided to build a k-NN model that avoids time leakage
  2. What is the performance of the model measured in Median Relative Absolute Error?
  3. What would be an appropriate methodology to determine the optimal k ?

Note:

dataset given has four columns [latitude, longitude, close_date, close_price]

  1. To prevent time leakage, a home j should be considered a neighbor to home i only if the
    close date of j occurred prior to the close date of i. Think about making a prediction using
    information available to house i . You only want to use information you have available at
    that time. One way of doing this is to restrict yourself to neighbors that have closed prior
    to the close date of i .
  2. The Median Relative Absolute Error (MRAE) is defined as median(|P_predict - P_true|/P_true)