Conversion Probability Prediction in Retargeting

(Provided and Sponsored BY RECOBELL)

 

Winners

You can download answer data in the dataset section below.

 

Introduction: Retargeting

Retargeting (also known as retarget advertising or remarketing) is a form of online display advertising that can help advertisers to display ads to people who have previously visited their websites. It works as follows. Whenever a user visits the advertiser’s website, the user's behavior logs such as views and orders are collected with anonymized user id. If the user visits other online ad media such as websites and apps, his/her previously viewed items or their related items are shown on ad inventory (this is called impression). If the user clicks the ad, he or she is led to the advertiser’s website and may purchase items, which is called conversion.

In the field of online retargeting, retargeting ads are designed to maximize not conversions but clicks. However, the ultimate goal of advertisers is to maximize conversions with limited marketing budget. Therefore, an ad server needs to deliver retargeting ads to those who are likely to make a conversion. From this point of view, RECOBELL believes that conversion probability prediction is a core technical part of retarget advertising.

Useful Link

https://en.wikipedia.org/wiki/Behavioral_retargeting

Task Description

The task is to develop an algorithm that predicts the probability of conversion when retargeting ads are shown to users. We provide view/order logs and product metadata collected from an e-commerce website for user behavior analysis (2016/08/01 ~ 2016/10/01). We also provide training data and test data from the same website’s retargeting ad campaign. The train data contains impression logs during 2016/09/01 ~ 2016/10/01 along with two labels that indicate 1) whether ad is clicked or not, and 2) whether it leads to conversion or not. To evaluate your algorithm, we will provide as test data impression logs during 2016/10/02 ~ 2016/10/4, where each impression does not contain click or conversion label. Contest entries will develop their models to predict conversion probability for each test impression log (click probability is not required). We will evaluate their performance using logarithmic loss. The method that yields the smallest logarithmic loss will win.

Dataset

The dataset for this competition is provided by RECOBELL and FUTURESTREAM NETWORKS.

All files are gzipped, CSV format. Column information are shown below.

Retargeting Advertisement Data (provided by FUTURESTREAM NETWORKS)


Field Valid Value Note
impression_id int AD Impression Id
impression_datetime timestamp(yyyy-MM-dd HH:mm:ss) Timestamp of log (Timezone : KST)
uid char(7) user id
platform char(1) Platform of user (1 : iPhone, 2 : Android, 3: iPad)
inventory_type char(1) Type of Inventory (A, B)
app_code varchar(10) Media Code, where ad was exposed
os_version varchar(10) Version of Operating System
model varchar(255) Model of Mobile Phone
network varchar(10) Type Connectivity (3G, 4G, WIFI ...)
is_click int Whether user clicked AD(1) or not(0)
is_conversion int Whether user buy item(s) after click this AD

Site User Behavior Data (provided by RECOBELL)

Field Valid Value Note View Log Order Log
server_time timestamp(yyyy-MM-dd HH:mm:ss.S) Timestamp of log (Timezone : KST) O O
device char(2) Device of user (MW : Mobile Web, MI: iPhone/iPad App, MA : Android App) O O
session_id char(10) Browser Session ID (using Session-Cookie) O O
uid char(7) User ID O O
item_id char(7) Item ID O O
order_id char(7) Order ID
O
quantity int Quantity of item
O

Site Product Meta Data (provided by RECOBELL)

Field Valid Value Note
item_id char(7) Product ID
price int Product Price (currency : KRW)
category1 char(7) Category Depth 1
category2 char(7) Category Depth 2
category3 char(7) Category Depth 3
category4 char(7) Category Depth 4
brarnd char(7) Brand ID

Evaluation Metric

Given ith impression in test data (1 <= i <= N), conversion probability prediction algorithm must produce a probability of conversion, 0 < pi < 1. Let yi be a binary variable indicating whether ith impression leads to conversion. Then, submissions are evaluated using the Logarithmic Loss (smaller is better).

An application will compare a solution file with retarget_test.csv file containing the answers to the test set and results will be presented in an online score board.

SUBMISSIONS

SCHEDULE

PRIZES (SPONSORED BY RECOBELL)

TERMS AND CONDITIONS

ORGANISING COMMITTEE

FAQ

LINK

Team List

LINK

Contact

In case of any questions please send an email to Jinwoo Park at pakdd2017@recobell.com

About RECOBELL (Main Sponsor)

Please visit the Korea’s no. 1 recommendation company:

About FUTURESTREAM NETWORKS (Data Provider)

Please visit the homepage of Korea’s no. 1 mobile ad network company: