Naveen Ashish, PhD
An individual’s attributes, such as his her or her interests, preferences, views, hobbies, inclinations, passions, aversions, political inclinations and such are of significant interest for a variety of commercial, research or community applications. Marketers or market developers, as just one example, stand to gain significantly from understanding such attributes about individuals or individuals in a given population as a whole. In addition, there is also interest in information about an individual’s “current state” such as their current location and/or activity. For instance mobile intervention health applications (on mobile devices) can significantly optimize and personalize their capabilities by leveraging activity details such as where an individual currently is, what they are now doing, etc.
It is only in social-media that we have the prospects. Beyond availability is also the issue of access. Twitter is most promising in this regard.
It is important however, to understand a few key limitations of the capability, given that we are reliant (solely) on open social-media to glean such information and those are:
- The fact that for a given population i.e., set of individuals of interest, we can only obtain such information for a small subset of those individuals.
- Also, that the attributes or aspects for an individual will not be comprehensively synthesized. In other words we have to work with what we get.
That said, such psychographic profile or activity information has to be usable in practical predictive analytics solutions. The profile information needs to be accurate, to the extent possible. The activity information needs also to be accurate and up-to-date as it serves real-time applications in many cases.
The TeraCrunch Socratez ProfilerTM is an AI tool for synthesizing psychographic and activity information from social-media. It build upon the TeraCrunch SocratezTM Unstructured Data Understanding Platform with the following capabilities:
- Data access The system has real-time, highly scalable access to data from social-media feeds, accessed via APIs as well as data stream providers
- Individual resolution. Multiple individuals exist with the same name, including on social-media. And psychographic aspects are sought for specific individuals, not (just) their namesakes. A challenge then is that of identifying the social-media identity of a specific individual, of course given some additional context such as their location, profession or any available information. Socratez Profiler employs state-of-the-art entity resolution and disambiguation algorithms and technologies to identify individuals from amongst namesakes, with high probabilistic confidence.
- Confident information synthesis. The aspects in an individual’s profile typically have to be derived. For instance the fact that “outdoor hiking” is one of a users’ interests Is not something that may explicitly stated but rather derived from multiple, regular posts about hiking activities. As another example, we would have more confidence in the derived aspect of an individual being vegetarian from multiple posts expressing positive sentiment about vegetarianism versus a one off post about the same. Socratez Profiler determines a probabilistic confidence for every derived aspect using machine-learning models.
- Privacy preserving. Socratez Profiler accesses and derives profile aspects and activity information only off social-media content that is open. However, individual level information is provided to application only with user consent ! In all other cases the tool provides only aggregated and non personally identifying aspects and activity.
Dr. Naveen Ashish is the Co-founder & Data Scientist Advisor to TeraCrunch for AI technologies. He is also the Head of Data Sciences at Fred Hutch (the Fred Hutchinson Cancer Research Center in Seattle, WA). The views and information in this article have no relationship with or are in any way representative of the work at Fred Hutch.