Sindhuri Mamidi

Data Scientist
Computer Science, Stony Brook University

Predicting the Super Bowl and College Football Champions of 2015

The goal of this project is to build machine learning models to predict the winners of 2015 Super Bowl and the College Football Championship using historical data.

We have predicted the outcome of football matches entirely using the knowledge of previous game statistics. We have used three different models to do this:

  1. Baseline model: "Point Score Difference Model". In this model we use the score difference to predict winners of future games.
  2. Linear Regression Model: In this model, we use linear regression to predict the point difference for each game.
  3. PageRank Model: Here, we model the game data as a graph with nodes as teams and edges as score differences between the teams. We then use PageRank on this game graph to rank all the teams. This ranking is used to predict winners of future games.

More information about the project can be found here.

Online Fraud detection in Yelp Review data

The goal of this project is to predict fake reviews by looking at connections among fake reviewers using network effects and relational analysis. Techniques here aim at capturing the relationships among reviewers, reviews and stores and also among reviewers/reviews.

    For Behavioral analysis, we have analyzed the reviewers profile on Yelp based on following factors.
  1. Maximum Number of Reviews
  2. Percentage of Positive Reviews
  3. Review Length
  4. Review Deviation
  5. Maximum Content Similarity

We have achieved 76% accuracy in detecting fake reviews of Yelp using Behavioral features listed above.

Chicago crime analysis

Aim of this project is to analyze the chicago crime rate over the years. Configured 4 node hadoop cluster and loaded 5 GB of data using HDFS and HBASE. Performed different types of analysis and multiple MapReduce jobs using Pig, Hive and Spark. Implemented Mahout Clustering K-Means Algorithm to help estimate future crime rates.


Twitter Sentiment analysis

Textual Information can be broadly categorized into two main types:

  • Facts
  • Opinions
  • In this project, we only focus on the expressions that convey people negative or positive sentiments and perform the sentimental analysis on the twitter data and try to find Correlation between the sentiments of the tweets and various other factors like how many number of retweets , geographic location, number of followers and friends count. We also see how the personality or social status of a user effects the retweet count through our analysis of data.

    More information about the project can be found here.


    Wi-Fi Traffic analysis

    Aim of this project is to snoop through one or more busy environments around the campus,collect all control and management packets such as beacons, RTS/CTS, ACKs and analyze the data.

    More information about the project can be found here.


    Sinter: Low-Bandwidth Remote Access for the Visually-Impaired

    Worked on part of this paper implementation - This paper describes a framework, called Sinter, that efficiently and seamlessly supports remote, cross-platform screen reading, without modifying the application or the screen reader. Sinter addresses these problems with a platform independent intermediate representation (IR) of a remote applications user interface (UI). The Sinter IR encapsulates platform-specific accessibility code on the remote system, facilitates development of additional accessibility features, and is simple enough to be reconstructed and read on any client platform, including in a web browser. In the example above, Sinter allows a user to read remote Windows applications with a Mac-only reader. Sinter supports low-bandwidth, remote access to a wide range of rich applications, including Microsoft Word and Apple Mail, with both Windows and Apple OS X clients and servers, as well as a web browser client. The paper also demonstrates the utility of IR-level programming by implementing several accessibility features, transparently to the remote application and reader. The network overheads of Sinter are orders of magnitude lower than the current state of the art for remote desktop access.

    More information about the project can be found here.