Requirements¶
Python 3.6+
For this project, we will be leveraging poetry for our dependency manager. To get started, create a virtual environment and install poetry:
pip install poetry==1.1.7
Next, run the following command to install all dependencies from
pyproject.toml:
poetry install
Note: A quick step to configure a virtual environment and handling Python versioning within this project is to simply run
make penv. For more details on using pyenv, check out the docs here. Assuming pyenv is now configured, you can then install the required dependencies into the activated virtual environment by running:
make build-env
Data¶
The dataset we will be working with can be accessed from the data directory
within the project root. Here’s a breakdown of the available data:
user_interests:
Keys |
Description |
|---|---|
user_handle |
a unique identifier for each user |
interest_tag |
the he date the user viewed the course |
date_followed |
the date user followed/expressed interest in the tag field |
user_assessment_scores:
Keys |
Description |
|---|---|
user_handle |
a unique identifier for each user |
user_assessment_date |
the date the user completed this assessment |
assessment_tag |
the assessment tag |
user_assessment_score |
the user’s score |
user_course_views:
Keys |
Description |
|---|---|
user_handle |
a unique identifier for each user |
view_date |
the he date the user viewed the course |
course_name |
name of course |
author_handle |
a unique identifier for each course’s author |
level |
difficulty level assigned to this course |
course_view_time_seconds |
number of seconds this user spent watching per day |
course_tags:
Keys |
Description |
|---|---|
course_id |
course identifier for joining to user_course_views |
course_tags |
author-applied tags to define what topics this course covers |
RESTful API¶
For this application, we will leverage Flask-RESTful, which is an extension for Flask that adds support for quickly building REST APIs in Python. Before running the Flask server, the following environment variables for configuring the database environment and results table will need to be set:
export DATABASE_ENV=dev
export RESULTS_TABLE=user_ranking
Once the above variables are set, we can then start Flask server by running the following command:
python -m server.run
Note: the default settings is to host the Flask server on localhost and listening on port 5000. To change this setting, you can modify
runserver.pyby setting the host and port arguments in theapp.runmethod, respectively.
To make a API call to the hosted server and return a JSON object containing the
top 5 most similar users to the user of interest, the following CURL command
can be run:
curl -X GET -H "Content-type: application/json"
-d "{\"user_handle\":\"110\"}" \
"http://0.0.0.0:5000/api/similarity/"
Note: the user id for this example is
110. Simply change the value of the user_handle key the above body field-dwith a user id of interest. Keep in mind that the range of user ids is between 1 and 10000.
Alternatively to using curl for making an API call, a Python utility function can also be used to facilitate returning similar users to the provided input user id. The following command can be executed to return the same JSON object:
python -m server.api --user_id 110
Output JSON for Matrix Factorization Approach
[
{
"similar":"5537",
"score":0.8957
},
{
"similar":"7771",
"score":0.8567
},
{
"similar":"5843",
"score":0.8519
},
{
"similar":"239",
"score":0.8512
},
{
"similar":"9530",
"score":0.8363
},
{
"similar":"687",
"score":0.7750
},
{
"similar":"854",
"score":0.7691
},
{
"similar":"1199",
"score":0.7563
},
{
"similar":"9046",
"score":0.7409
},
{
"similar":"7341",
"score":0.7401
}
]