Individual Psychographs & Activity

Naveen Ashish, PhD

An individual’s attributes, such as his her or her interests, preferences, views, hobbies, inclinations, passions, aversions, political inclinations and such are of significant interest for a variety of commercial, research or community applications. Marketers or market developers, as just one example, stand to gain significantly from understanding such attributes about individuals or individuals in a given population as a whole. In addition, there is also interest in information about an individual’s “current state” such as their current location and/or activity. For instance mobile intervention health applications (on mobile devices) can significantly optimize and personalize their capabilities by leveraging activity details such as where an individual currently is, what they are now doing, etc.

It is only in social-media that we have the prospects. Beyond availability is also the issue of access. Twitter is most promising in this regard.

It is important however, to understand a few key limitations of the capability, given that we are reliant (solely) on open social-media to glean such information and those are:

  • The fact that for a given population i.e., set of individuals of interest, we can only obtain such information for a small subset of those individuals.
  • Also, that the attributes or aspects for an individual will not be comprehensively synthesized. In other words we have to work with what we get.

That said, such psychographic profile or activity information has to be usable in practical predictive analytics solutions. The profile information needs to be accurate, to the extent possible. The activity information needs also to be accurate and up-to-date as it serves real-time applications in many cases.

The TeraCrunch Socratez ProfilerTM is an AI tool for synthesizing psychographic and activity information from social-media. It build upon the TeraCrunch SocratezTM Unstructured Data Understanding Platform with the following capabilities:

  • Data access The system has real-time, highly scalable access to data from social-media feeds, accessed via APIs as well as data stream providers
  • Individual resolution. Multiple individuals exist with the same name, including on social-media. And psychographic aspects are sought for specific individuals, not (just) their namesakes. A challenge then is that of identifying the social-media identity of a specific individual, of course given some additional context such as their location, profession or any available information. Socratez Profiler employs state-of-the-art entity resolution and disambiguation algorithms and technologies to identify individuals from amongst namesakes, with high probabilistic confidence.
  • Confident information synthesis. The aspects in an individual’s profile typically have to be derived. For instance the fact that “outdoor hiking” is one of a users’ interests Is not something that may explicitly stated but rather derived from multiple, regular posts about hiking activities. As another example, we would have more confidence in the derived aspect of an individual being vegetarian from multiple posts expressing positive sentiment about vegetarianism versus a one off post about the same. Socratez Profiler determines a probabilistic confidence for every derived aspect using machine-learning models.
  • Privacy preserving. Socratez Profiler accesses and derives profile aspects and activity information only off social-media content that is open. However, individual level information is provided to application only with user consent ! In all other cases the tool provides only aggregated and non personally identifying aspects and activity.

Dr. Naveen Ashish is the Co-founder & Data Scientist Advisor to TeraCrunch for AI technologies. He is also the Head of Data Sciences at Fred Hutch (the Fred Hutchinson Cancer Research Center in Seattle, WA). The views and information in this article have no relationship with or are in any way representative of the work at Fred Hutch.

Lifestyle Attributes Database & Analysis for Cancer Research

Naveen Ashish, PhD

Lifestyle and Cancer

Researchers have traditionally analyzed patient data that is genetic, patient medical record data, and other data such as patient MRI images or Pathology reports in cancer research. Cancer has largely been considered as emanating “from genetic factors”. It is becoming increasingly clear however, that the primary factors towards cancers of various kinds are not solely genetic, but that environment and lifestyle are key contributing factors as well.

By ‘lifestyle’ factors, we imply aspects such as diet, exercise habits, other physical activity, weight and body mass index (BMI), alcohol tobacco or other substance use, sleep habits, stress and anxiety, etc. Many interesting studies, evaluating the impact of some such lifestyle factors, have appeared in recent years. For instance in breast cancer, a number of risk factors have been identified in the pathogenesis of breast tumors and among these, a great number are linked to nutrition and life-style such as alcohol consumption, obesity, and eating patterns (Study Link). A number of epidemiological studies (Study Link, Study Link) have provided convincing evidence that alcohol consumption is an important risk factor for the incidence and mortality of breast cancer. On the other hand, soybean products act as cancer preventive agents as shown in rodents and other animals (Study Link).

Interestingly soy products have been a staple part of the Asian diet for centuries (they are the predominant source of isoflavones, which belong to the family of phytoestrogens) and studies that investigate the relationship between soy food intake after the diagnosis of breast cancer and health status reported a slightly protective effect especially among the Asian population (Study Link). There have been almost 200 publications in the last 10 years, with the scope of research spanning genetic risk factors, late effects from treatment, comorbidities, second malignant neoplasms, reproductive health, psychosocial outcomes, long-term health, and lifestyle behaviors.

Getting Lifestyle Data: Social-Media

While lifestyle attributes are important, this is data that is typically hard to obtain as it is typically not part of patient medical records or other data. Many attributes, for a particular individual, are also dynamic and evolving – for instance exercise schedules, diet or other habits can and do change with time, place and other context. A promising source for gathering such information is social-media ! A person’s posts, conversations comments, likes and dislikes, and other expressions often provide valuable information from which lifestyle attributes can be derived. However, gathering such information in scalable fashion is a data science challenge.

TC-PAD: Lifestyle Data in a Database

TeraCrunch has developed TC-PAD (for Personal Attributes Database) which is comprehensive solution that provides a structured database of key personal attributes of an individual’s lifestyle. TC-PAD requires authorized access from individuals for the social-media feeds and profile, which it then accesses for content. It then applies sophisticated natural language and semantic understanding algorithms (part of the TeraCrunch Socratez text understanding suite) to synthesize a variety of lifestyle attributes, per individual and deliver this data as a structured database. This is an “on-demand” solution where such a lifestyle database can be created virtually instantly for a cohort of individuals, once authorization credentials have been provided.