
Dataset Name: reddit_investing_comments
Group: social_media
Vendor: Reddit powered by CloudQuant
Data Starts at: 2021-01-01 00:00:00
Symbol Set: Meme Securities
Asset Class: Equity
Data Update Time(s): live
Data Update Frequency: intraday
The comments from Reddit's r/investing with multiple NLP sentiment scores and mapped to cash tags and stock trading symbols. This dataset is free of charge to licensed CloudQuant users. This subreddit covers losing money with friends.
Data Contained in this Dataset
Column | Type | Description |
---|---|---|
_seq | uint | Internal sequence number used to keep data rows in order |
timestamp | string | Timestamp of the Data - America/New York Time. |
muts | uint64 | Microseconds Unix Timestamp. An integer representation of a timestamp with microsecond precision that can be compared directly to other timestamps. |
symbol | string | Trading Symbol or Ticker |
total_awards_received | int | The number of awards on the submission |
approved_at_utc | string | timestamp of approval. null if nobody or you are not a mod |
comment_type | string | Reddit comment type |
awarders | string | List of users who gave this post/comment an award |
mod_reason_by | string | The moderator who added the removal reason if applicable. |
banned_by | string | moderator that banned this post/comment |
author_flair_type | string | The type of flair used by the submission’s author. |
removal_reason | string | A removal reason set by moderators if applicable. |
link_id | string | ID of the link this comment is in |
author_flair_template_id | string | The comment author’s flair template ID if applicable. |
likes | string | how the logged-in user has voted on the link - True = upvoted, False = downvoted, null = no vote |
user_reports | string | A list of the user reports on the submission |
saved | bool | true if this post is saved by the logged in user |
id | string | unique id for post/comment |
banned_at_utc | string | The UTC timestamp at which the author was banned. |
mod_reason_title | string | The mod reason’s title if applicable. |
gilded | int | the number of times this comment received reddit gold |
archived | bool | Whether the submission has been archived by Reddit. |
no_follow | bool | Bool - No Follow indicator |
author | string | User information who wrote the post/comment |
edited | string | Whether or not the submission has been edited |
can_mod_post | bool | Whether the logged-in user can modify the post. |
created_utc | string | the time of creation in UTC epoch-second format. Note that neither of these ever have a non-zero fraction. |
send_replies | string | Whether the author of the submission will receive reply notifications |
parent_id | string | link to the either a comment or post that is the immediate parent of this comment |
score | int | the net-score of the comment |
author_fullname | string | The comment author’s ID prepended with t2_. |
treatment_tags | string | Community content tags are tags that moderators add to their communities to let redditors know what kind of mature content is in that community. In the past, Reddit used a Not Safe for Work (NSFW) tag to distinguish communities and content most people wou |
approved_by | string | who approved this comment. null if nobody or you are not a mod |
mod_note | string | Moderator notes added to the submission. |
all_awardings | string | A list of awards added to the submission/comment |
subreddit_id | string | subreddit_id: The subreddit’s ID prepended with t5_. |
body | string | comment raw text |
author_flair_css_class | string | the CSS class of the author's flair. subreddit specific |
name | string | Full name of the submission. |
author_patreon_flair | string | The comment author’s Patreon flair if applicable. |
downs | int | the number of downvotes. (includes own) |
author_flair_richtext | string | The comment author’s flair text if applicable |
is_submitter | bool | Whether the logged-in user is the submitter of this comment. |
body_html | string | the formatted HTML text as displayed on reddit. |
gildings | string | The gild awards the submission has received. |
collapsed_reason | string | reason comment is minimized |
distinguished | string | to allow determining whether they have been distinguished by moderators/admins. null = not distinguished. moderator = the green [M]. admin = the red [A]. special = various other special distinguishes |
associated_award | string | Associated Award |
stickied | bool | true if the post is set as the sticky in its subreddit. |
author_premium | bool | bool - Author Premium |
can_gild | bool | Whether the logged-in user can gild the submission |
top_awarded_type | string | post/comment highest (most expensive) reward |
author_flair_text_color | string | The submission/comment author flair text color if applicable. |
score_hidden | bool | Whether the comment's score is currently hidden. |
permalink | string | The collection’s permalink (to view on the web). |
num_reports | string | how many times this comment has been reported, null if not a mod |
report_reasons | string | A list of report reasons on the submission. |
subreddit | string | subreddit the post/comment belongs to |
author_flair_text | string | the text of the author's flair. subreddit specific |
created | string | the time of creation in local epoch-second format. ex: 1331042771.0 |
collapsed | bool | Whether the comment should be collapsed by clients |
subreddit_name_prefixed | string | subreddit_name_prefixed The name of the subreddit the submission was posted on, prefixed with “r/”. |
controversiality | int | A score on the comment’s controversiality based on its up- and downvotes. |
locked | bool | whether the link is locked (closed to new comments) or not. |
author_flair_background_color | string | The submission/comment author’s flair background color. |
collapsed_because_crowd_control | string | thread collapsed because of volume of replies (comments replying to post/comment) |
mod_reports | string | A list of moderator reports on the submission/comment |
subreddit_type | string | the subreddit's type - one of public, private, restricted, or in very special cases gold_restricted or archived |
ups | int | the number of upvotes. (includes own) |
symbols | string | Symbols found in title and body and parent of a post or a comment which includes cashtags |
cashtags | string | Cashtags found in title and body of a post or a comment as well as a comment's parents |
symbol_src | string | where the symbol was found. Either: title, body, or parent |
vader_body_sentiment_neg | double | Percentage (%) of the body text that is negative |
vader_body_sentiment_neu | double | Percentage (%) of the body text that is neutral |
vader_body_sentiment_pos | double | Percentage (%) of the body text that is positive |
vader_body_sentiment_compound | double | sum of all the body text sentiment ratings |
vader_title_sentiment_neg | double | Percentage (%) of the title text that is negative (always 0 for a comment) |
vader_title_sentiment_neu | double | Percentage (%) of the title text that is neutral (always 0 for a comment) |
vader_title_sentiment_pos | double | Percentage (%) of the body text that is positive (always 0 for a comment) |
vader_title_sentiment_compound | double | sum of all the body text sentiment ratings |
textblob_body_sentiment_polarity | double | The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |
textblob_body_sentiment_subjectivity | double | The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |
textblob_title_sentiment_polarity | double | The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |
textblob_title_sentiment_subjectivity | double | The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |