Discovering and Analyzing Important Real-Time Trends in Noisy Twitter Streams
Khalid N. Alhayyan, Imran Ahmad
We present an approach, called StreamSensing, suitable for processing real-time data in noisy streams. This approach consists of six stages: (1) tokenization, (2) stop words removal, (3) stemming, (4) filtering, (5) conversion into Term Document Matrix (TDM), and (6) pattern analysis. The approach was experimentally tested and implemented using a fast in-memory processing system, called Spark. The results of such implementation are reported and analyzed. The findings of this paper fall into two perspectives: theoretical and practical. The theoretical perspective is represented in the introduction of the StreamSensing approach, while practically; this approach can be employed to perform trend analysis on any real-time text data stream.