
Dataset Name: brain_blmcf_ext_10-k_metrics
Group: sec_filing
Vendor: Brain
Asset Class: Equity
Data Update Time(s): 7:05 AM EST
Data Update Frequency: day
The Brain Language Metrics on Company Filings (BLMCF) monitors several language metrics on 10-Ks and 10-Qs for 6000+ US stocks. This EXTENDED version provides additional language metrics covering not only the report as a whole but also for specific report sections (e.g. Risk Factors and MD&A sections).
Data Contained in this Dataset
Column | Type | Description |
---|---|---|
_seq | uint | Internal sequence number used to keep data rows in order |
timestamp | string | Timestamp of the Data - America/New York Time. |
muts | uint64 | Microseconds Unix Timestamp. An integer representation of a timestamp with microsecond precision that can be compared directly to other timestamps. |
symbol | string | Trading Symbol or Ticker |
COMPOSITE_FIGI | string | The FIGI composite code (https://www.openfigi.com) that identifies the stock across related exchanges in the same country. |
DATE | string | Date |
LAST_REPORT_CATEGORY | string | Category of last report (with respect to DATE) issued by the company (10-Q or 10-K) |
LAST_REPORT_DATE | string | The date of last report (with respect to DATE) issued by the company in YYYY-MM-DD format |
N_SENTENCES | uint | Number of sentences extracted from the last available report. 1 - inf |
MEAN_SENTENCE_LENGTH | double | The mean sentence length measured in terms of the mean number of words per sentence for the last available report. 1 - inf |
SENTIMENT | double | The financial sentiment of the last available report. -1.0 to +1.0 |
SCORE_UNCERTAINTY | double | The percentage of financial domain “uncertainty” language for present in the last report. 0.0 - 1.0 |
SCORE_LITIGIOUS | double | The percentage of financial domain “litigious” language for present in the last report. 0.0 - 1.0 |
SCORE_CONSTRAINING | double | The percentage of financial domain “constraining” language present in the last report. 0.0 - 1.0 |
SCORE_INTERESTING | double | The percentage of financial domain “interesting” language present in the last report. 0.0 - 1.0 |
READABILITY | double | Reading grade level for the the report expressed by a number corresponding to US education grade. The score is obtained from the average of various readability tests to measure how difficult is the text to understand (e.g. Gunning Fog Index). 0,inf |
LEXICAL_RICHNESS | double | Lexical richness measured in terms of the Type-Token Ratio (TTR) which calculates the number of types (total number of words) divided by the number of tokens (number of unique words). The basic logic behind this measure is that if the text is more complex |
LEXICAL_DENSITY | double | Lexical density to measure the text complexity by computing the ratio between number of lexical words (nouns, adjectives, lexical verbs, adverbs) divided by the total number of words in the document. 0.0 - 1.0 |
SPECIFIC_DENSITY | double | Percentage of words belonging to the specific dictionary used for company filings analysis present in the last available report. 0.0 - 1.0 |
RF_N_SENTENCES | double | Number of sentences extracted from the section “Risk Factors” of the last available report. |
RF_MEAN_SENTENCE_LENGTH | double | The mean sentence length measured in terms of the mean number of words per sentence for the section “Risk Factors” of the last available report. 1,inf |
RF_SENTIMENT | double | The financial sentiment for the section “Risk Factors” of the last available report. -1.0 to +1.0 |
RF_SCORE_UNCERTAINTY | double | The percentage of financial domain “uncertainty” language present in the section “Risk Factors” of the last |
RF_SCORE_LITIGIOUS | double | The percentage of financial domain “litigious” language present in the section “Risk Factors” of the last |
RF_SCORE_CONSTRAINING | double | The percentage of financial domain “constraining” language present in the section “Risk Factors” of the last report. 0.0 - 1.0 |
RF_SCORE_INTERESTING | double | The percentage of financial domain “interesting” language present in the section “Risk Factors” of the last |
RF_READABILITY | double | Reading grade level for the section “Risk Factors” of the last available report. 1,inf |
RF_LEXICAL_RICHNESS | double | Lexical richness for the section “Risk Factors” of the last available report. 0.0 - 1.0 |
RF_LEXICAL_DENSITY | double | Lexical density for the section “Risk Factors” of the last available report. 0.0 - 1.0 |
RF_SPECIFIC_DENSITY | double | Percentage of words belonging to the specific dictionary used for company filings analysis present in the section “Risk factors” of the last available report. 0.0 - 1.0 |
MD_N_SENTENCES | double | Number of sentences extracted from the “MD&A” sections of the last available report. 1,inf |
MD_MEAN_SENTENCE_LENGTH | double | The mean sentence length measured in terms of the mean number of words per sentence for the “MD&A” |
MD_SENTIMENT | double | The financial sentiment for the “MD&A” sections of the last available report. -1.0 to +1.0 |
MD_SCORE_UNCERTAINTY | double | The percentage of financial domain “uncertainty” language present in the “MD&A” sections of the last report. 0.0 - 1.0 |
MD_SCORE_LITIGIOUS | double | The percentage of financial domain “litigious” language present in the “MD&A” sections of the last report. 0.0 - 1.0 |
MD_SCORE_CONSTRAINING | double | The percentage of financial domain “constraining” language present in the “MD&A” sections of the last |
MD_SCORE_INTERESTING | double | The percentage of financial domain “interesting” language present in the “MD&A” sections of the last report. 0.0 - 1.0 |
MD_READABILITY | double | Reading grade level for the “MD&A” sections of the last available report. 0,inf |
MD_LEXICAL_RICHNESS | double | Lexical richness for the “MD&A” sections of the last available report. 0.0 - 1.0 |
MD_LEXICAL_DENSITY | double | Lexical density for the “MD&A” sections of the last available report. 0.0 - 1.0 |
MD_SPECIFIC_DENSITY | double | Percentage of words belonging to the specific dictionary used for company filings analysis present in the |