Dataset: ydata-ymovies-user-movie-ratings-content-v1_0

Yahoo! Movies user ratings of movies, and movie descriptive content
information, version 1.0

=====================================================================
This dataset is provided as part of the Yahoo! Research Alliance
Webscope program, to be used for approved non-commercial research
purposes by recipients who have signed a Data Sharing Agreement with
Yahoo!. This dataset is not to be redistributed. No personally
identifying information is available in this dataset. More information
about the Yahoo! Research Alliance Webscope program is available at
http://research.yahoo.com
=====================================================================

Full description:

This dataset contains six files:

ydata-ymovies-user-movie-ratings-train-v1_0.txt
ydata-ymovies-user-movie-ratings-test-v1_0.txt
ydata-ymovies-user-demographics-v1_0.txt
ydata-ymovies-movie-content-descr-v1_0.txt -- movie_db_yoda!!!
ydata-ymovies-mapping-to-movielens-v1_0.txt
ydata-ymovies-mapping-to-eachmovie-v1_0.txt

each based on data generated by Yahoo! Movies on or before November
2003, with some modifications and additions by Yahoo! Research. All
user ids are anonymized. Fields in each data file are delimited with
tab ("\t") characters.

The content of the six files are as follows:

=====================================================================
(1) "ydata-ymovies-user-movie-ratings-train-v1_0.txt" contains a small
    sample of Yahoo! users' ratings of movies, with the following
    fields:

    0 anonymized user_id
    1 movie_id
    2 rating(from 1(F) to 13(A+))
    3 converted rating(from 1 to 5: A-,A, A+ will be converted to 5)

    The training data contains 7,642 users (|U|), 11,915 movies/items
    (|I|), and 211,231 ratings (|R|). The average user rating
    ($\overline{R_u} = \frac{\sum_u \overline{r_u}}{|U|}$,
    macro-averaged) is 9.64 and the average item rating
    (macro-averaged) is 9.32. The average number of ratings per user
    is 27.64 and the average number of ratings per item is 17.73. All
    users have rated at least 10 items and all items are rated by at
    least one user. The density ratio (\delta = \frac{|R|}{|U|*|I|})
    is 0.0023, meaning that only 0.23% of entries in the user-item
    matrix are filled.

Snippet:

1       1800029049      12      5
1       1804857429      8       4
1       1800030906      13      5
1       1800018548      11      5
1       1800256362      9       4

=====================================================================
(2) "ydata-ymovies-user-movie-ratings-test-v1_0.txt" contains a small
    sample of Yahoo! users' ratings of movies. This test data was
    gathered chronologically after the training data. The file
    contains the following fields:

    0 anonymized user_id
    1 movie_id
    2 rating(from 1(F) to 13(A+))
    3 converted rating(from 1 to 5)

    The test data contains 2,309 users, 2,380 items, and 10,136
    ratings.  There are no test users/items that do not also appear in
    the training data.The average user rating is 9.66 and the average
    item rating is 9.54. The average number of ratings/user is 4.39
    and the average number of ratings/item is 4.26. All users have
    rated at least one item and all items have been rated by at least
    one user.

Snippet:

5       1808405757      9       4
6       1800247298      12      5
6       1805540029      11      5
6       1804090611      12      5
6       1800019304      12      5

=====================================================================
(3) "ydata-ymovies-user-demographics-v1_0.txt" contains user
    demographic information, with the following fields:

    0 anonymized user_id
    1 birthyear
    2 gender

Snippet:

1       1979    f
2       1987    m
3       1988    f
4       1983    m
5       1988    m

=====================================================================
(4) "ydata-ymovies-movie-content-descr-v1_0.txt" contains movie
    descriptive content information. If a field contains multiply
    items (e.g. the "list of actors" field contains multiple actors),
    then each item (e.g., actor) is separated by a "|" character.
    "\N" means that the field is empty. The file contains the
    following fields:

    0 Yahoo! movie id. To construct a URL to the Yahoo! Movies Main
      Page for the movie, insert the movie id into
      "http://movies.yahoo.com/movie/MOVIE_ID/info",
      e.g. http://movies.yahoo.com/movie/1807428853/info

    1 title
    2 synopsis   
    3 running time
    4 MPAA rating  
    5 reasons for the MPAA rating
    6 release date (yyyymmdd)
    7 distributor

    8 URL of poster. To construct a valid URL, append field to
      "http://us.movies1.yimg.com/movies.yahoo.com/images/hv/"

    9 list of genres
    10 list of directors
    11 list of director ids
    12 list of crew members
    13 list of crew ids
    14 list of types of crew

    15 list of actors (name [character]). Character name information
       may be incomplete or incorrect.

    16 list of actor ids. To construct a URL to the Yahoo! Movies Main
       Page for an actor, append the actor id to
       "http://movies.yahoo.com/movie/contributor/",
       e.g. http://movies.yahoo.com/movie/contributor/1800019596

    17 average critic rating
    18 the number of critic ratings
    19 the number of awards won
    20 the number of awards nominated
    21 list of awards won
    22 list of awards nominated

    23 rating from The Movie Mom. More information about The Movie Mom
    is available at http://movies.yahoo.com/mv/moviemom/about.html

    24 review from The Movie Mom. More information about The Movie Mom
    is available at http://movies.yahoo.com/mv/moviemom/about.html

    25 list of review summaries by critics and users
    26 list of anonymized review owners
    27 list of captions from trailers/clips

    28 URL of Greg's Preview. To construct a valid URL, append field
       to "http://movies.yahoo.com/movie/preview/". Greg's Previews of
       Upcoming Movies are compiled by Greg Dean Schmitz (from
       UpcomingMovies.com). Greg's Previews are available at
       http://movies.yahoo.com/mv/upcoming/ . More information about
       Greg's Previews is available at
       http://movies.yahoo.com/feature/aboutgreg.html

    29 URL of DVD review. To construct a valid URL, append field to
       "http://movies.yahoo.com/mv/dvd/reviews/"

    30 global non-personalized popularity (GNPP)

       GNPP was generated by Yahoo! Research as follows:
       GNPP = 1/k *
       (avg(i)+log_2(n(i))+log_10(#awards_won*10+#award_nomination*5))
       where avg(i) is field 31, n(i) is field 32, and k is
       normalization factor such that the maximum GNPP value is 13.

    31 average rating of this item among users in the training data
    32 the number of users in the training data who rated this item

Snippet:

1800010969      The 1985 Admiral's Cup (1997)   Small boats vs. big winds in this the official "Champagne Mumm" video that captures not only the tactical inshore races, but the sheer spectacle of the gale swept offshore races.      \N     \N       \N              \N      \N      \N      Special Interest        \N     \N       \N      \N      \N      \N      \N      \N      \N      \N      \N     \N       \N      \N      \N      \N      \N      \N      \N      \N      \N     \N       \N
1800011786      984 - Prisoner of the Future (1984)     984, a man of the future, is imprisoned without a trial, tortured, and beaten. A shocking tale of gross inhumanity.     \N      \N      \N              \N      \N      \N      Science Fiction/Fantasy \N      \N      \N      \N      \N      \N      \N      \N     \N       \N      \N      \N      \N      \N      \N      \N      \N      \N     \N       \N      \N      \N      \N

=====================================================================
(5) "ydata-ymovies-mapping-to-movielens-v1_0.txt" contains a mapping
    from the movie ids used in this Yahoo! Movies dataset to the
    corresponding movie ids and titles used in the MovieLens
    dataset. The mapping may be incomplete or incorrect. The MovieLens
    dataset was created by the GroupLens research group at the
    University of Minnesota, and is not associated with Yahoo! or
    available via Yahoo!. More information about the MovieLens dataset
    is available at
    http://www.cs.umn.edu/Research/GroupLens/index.html . The file
    contains the following fields:

    0 yahoo_movie_id
    1 movie title
    2 movielens_movie_id

Snippet:

1800247298      Toy Story (1995)        1
1800022746      Jumanji (1995)  2
1800250021      Grumpier Old Men (1995) 3
1800249828      Waiting to Exhale (1995)        4
1800249488      Father of the Bride Part II (1995)      5

=====================================================================
(6) "ydata-ymovies-mapping-to-eachmovie-v1_0.txt" contains a mapping
    from the movie ids used in this Yahoo! Movies dataset to the
    corresponding movies ids and titles used in the EachMovie
    dataset. The mapping may be incomplete or incorrect. The EachMovie
    dataset was created by the Digital Equipment Corporation's Systems
    Research Center and is not associated with Yahoo! or available via
    Yahoo!. The file contains the following fields:

    0 yahoo_movie_id
    1 movie title
    2 eachmovie_movie_id

Snippet:

1800247298      Toy Story       1
1802820491      Jumanji 2
1800250021      Grumpier Old Men        3
1800249828      Waiting to Exhale       4
1800249488      Father of the Bride Part II     5
