Archive | Misc. RSS for this section

Working with Twitter Streams with python

So here’s how to work with Twitter streams using python (in Windows). Note that I’ll be working on python 2.7.8 here:

  1. Install python: I’d already done this, so I can’t describe this step-by-step. But as far as I remember, it was pretty straightforward, installing it like any other application is installed after downloading the python executable from the Download section of this webpage.

  2. Install pip: pip is used to install python packages (a normally tedious process) extremely easily. Here’s how to install it:

    1. Download pip from here. If you’re working on Chrome, rightclick the “” link, and select “Save Link As”, as opposed to just clikcing on it. Ensure it is saved as a .py file.
    2. Just opening the installed file will cause a python script to run, and pip will be installed (along with setuptools, if setuptools has not already been installed).
    3. Most people don’t seem to need this, but I had to add the folder containing python scripts to my Path variable. To do this in Windows 8.1, go to the start menu, type in “System Environment”, clikc on “Environment Variables”, edit the “Path” field under system variables and append “;C:\Python27\Scripts” (or “;whatever\your\python\path\is\Scripts”).
    4. Now to test this, type “pip” in the command prompt terminal (Windows+R > cmd).

  3. Install tweepy: in the command prompt terminal, type “pip install tweepy”
  4. Setup Twitter account to access the live 1% stream: To access the live 1% stream, do the following (steps taken from Coursera course “Introduction to Data Science” ‘s first offering’s first assignment):

    • Create a twitter account if you do not already have one.
    • Go to and log in with your twitter credentials.
    • Click “create an application”
    • Fill out the form and agree to the terms. Put in a dummy website if you don’t have one you want to use.
    • On the next page, scroll down and click “Create my access token”
    • Click “Create my access token.”
  5. Test things out! : Follow a few users, and wait until a few tweets are visible on your home timeline. Then run the “Hello Tweepy” program, and if it works, and you see your home timeline’s tweets, you’re all set! (Be sure to replace the consumer_key, consumer_secret, access_token and access_token_secret fields with their appropriate values first).

Improving the Setwise Stream Classification Problem

Srishti and I worked on Setwise Stream Classification Problem. Here are a few quick details.

A few of the drawbacks that we found, and the improvement strategy that we proposed to address these drawback, were as follows:

  1. Selection of min_stat parameter has not been discussed

     The original algorithm uses min_stat to determine when to classify an entity (it classifies an entity only if the more than min_stat number of points have been received exceeds min_stat number of points). However, it doesn’t mention anything about how to obtain min_stat. In our proposed method, min_stat is determined in a way so that it returns a class profile as soon as possible, but with likelihood of it having predicted the final class profile that it will predict, given all points in the entity, being beyond a certain amount (say, 80%). This is done by using the entire training data, and determining after how many points are used to obtain the fingerprint of an entity, the class profile the said entity is nearest to doesn’t change very frequently. In other words, after using how many points to determine the fingerprint can the class profile be determine with a certainty more than a certain amount.

  2. k-means was used for clustering the initial sample, which is affected by choice of initial seeds

    To reduce the dependency on seeds chosen initially, we proposed using bisected k-means

  3. The anchors remain constant throughout

    Anchors play a crucial role in determining fingerprints, which in term determines which class profile an entity is assigned to. However, in the original algorithm, anchors remain constant with time. In our algorithm, we propose updating the anchors incrementally with time, offline. While this would cause some overhead, it would make the system less prone to concept drift. This improvement would be of particular interest when parallelizing the algorithm, since the parallelized implementation of the update of anchors would reduce the overhead significantly.

  4. The problem of concept drift has not been dealt with

    Our improved implementation proposes using a distance-based method to measure concept drift, and those entities which cross the distance threshold are classified into a separate “concept drifted” class (with a possibility of being classified in one of the classes when sufficient data becomes available, although we haven’t looked into this in detail).

Our strategy for parallelizing the original algorithm is as follows:

  1. Parallelize the K-Means run on init_sample
  2. Parallelize the process of determining the closest anchor (Required only for a large number of anchors, and/or for high dimensional data)
  3. Parallelize updating the fingerprint of the appropriate entity (this is required if there are a very large number of anchors)
  4. Determine class profile that is closest to the fingerprint in question (this is required only if there are a very large number of class profiles)

And finally, here is a quick few steps on how to get OpenMPI running:

  1. Install cygwin for windows
  2. Select “openMPI”, “libopenmpi”, “libopenmpicxx1” from lib, “gdb”, “gcc-core”, “gcc-g++” from Devel, and “openmpi-debuginfo” from Debug option (1.8.2 at the time of writing)

Building a SLAM bot with a Kinect

As our project for the semester, Kapil, Tanmay and I will be building a bot that performs Simultaneous Localization and Mapping (or SLAM, in short) under Dr. J. L. Raheja at CEERI (Central Electronics Engineering Research Institute) , Pilani. Here, I’ll be writing about the difficulties we faced, what we did, a few good resources which helped us out, etc. All the code that we’ve written can be found here.

The entire SLAM project would be done using MATLAB. The first thing we decided to do was to build an obstacle avoider bot using the kinect, as a warm-up task of sorts. This would be a first step in several things: getting ourselves a working bot, controlling this bot using MATLAB, getting data from the kinect and analyzing it in real time with MATLAB, and finally, combinning all these steps (namely, analyzing images from the kinect depth sensor in real time, and using the information obtained from them to make our obstacle avoider).  Here’s how each of these steps panned out in detail:

  • Assembling the bot

    CEERI had purchased a bot from (Streak), which could carry a laptop and kinect on it comfortably. However, the software which was to burn the code onto the microcontroller (and to be used for serial communication with it) failed to run on any of our laptops (although it worked fine on an old Windows XP 32-bit system). Thus, we decided to use an Arduino instead, and purchased separate motor drivers (the original motor drivers was integrated with the microcontroller board). We also purchased new LiPo batteries, since the original Li-Ion batteries we had received was, well, non-functional. Oh well. We now have a fully assembled, working bot 😀

    A list of the parts we used is as follows:

    Component Specification Number
    Microcontroller Freeduino Mega 2560 (link) 1
    Chassis High Strength PVC alloy Unbreakable body (of Streak) 1
    Wheels Tracked wheel Big 10cm (of Streak) 4
    Motor 300 RPM, 30kgcm DC geared motor (of Streak) 4
    Motor driver 20A Dual DC Motor Driver (link) 1
    Battery Lithium Polymer 3 Cell, 11.1V, 5000mAh (link) 1
    Battery protection circuit Protection Circuit for 3 Cell Li-Po Battery (link) 1
    Battery charger Lithium Polymer Balance Charger 110-240V AC (link) 1
    RGBD sensor Microsoft Kinect for Xbox 1
  • Controlling the bot via MATLAB using an Arduino

    The obstacle avoider would be controlled using the Arduino board, via serial communication with MATLAB, which would be processing images taken from the kinect’s depth sensor to do the obstacle avoiding.Thus,  we needed to setup MATLAB to Arduino communication (which I had already worked on before, though). The code can be found here. The code requires MATLAB, the Arduino IDE and Arduino I/O for MATLAB. Note that the Serial port takes a very long time to show in the Arduino IDE, and this can be solved by following this set of instructions.

  • Obtaining data from the kinect, and processing the kinect’s images in real time

    This was fairly straightforward, thanks to MATLAB’s Image Acquisition Toolbox. All that was needed in addition to this was installing the Kinect for Windows SDK (I have v1.8 installed). The code can be found here. Of course, this is a continuous video input, and we’ll be using individual images later on to draw onto them easily.

  • Making the bot

    To make the bot itself, we used the depth image to locate where obstacles were located. We divided the screen into 3 parts, and took into consideration how many obstacles lying within a certain distance from the kinect were present in each of the 3 parts. The bot would then take the path with the least number of obstacles. The number of obstacles were counted by counting the number of centroids of each connected component. The pseudocode is as follows:

    initialize serial connection with Arduino
    setup the required pins on the Arduino
    start the kinect’s video input
    while (time elapsed < total run time required)

    acquire the depth and RGB image from the kinect
    threshold the depth image, so that only objects nearby are considered
    remove noise (by removing components whose connected area is less than a certain
    obtain the individual connected components, each representing an object
    obtain the centroid of each connected component
    divide the image into 3 parts, and make the bot take the direction which has fewest
    end while

    Here’s an image of what the laptop placed on top of the bot shows when the bot is moving:

    Screenshot of obstacle avoider

    Screenshot of obstacle avoider

  • Problems

    The main problem that we faced was that the kinect has can’t detect objects that are closer than 80cm. Thus, if an obstacle appears in front of the bot when the bot turns, and the obstacle is closer than 80cm, the bot can’t detect it. The kinect just returns a zero value for nearby objects, and for both objects that are too close to the kinect (< 80cm away) and too far away from the kinect (> 4m away), the kinect’s depth sensor returns a zero value. There were a few possible solutions we could think of to this (we’d read 2 of these up somewhere online, each from a different site, but I don’t remember the resources, I’m afraid):

    1. Incline the kinect at an angle from a height, so that an obstacle at the foot of the bot is slightly more than 80cm away.
    2. Take the closest pixel’s depth value that is non-zero as the undefined value.
    3. Take the closest pixel’s depth value that is non-zero, and use some sort of threshold to determine whether the zero point represents an object closer than 80cm, or one further than 4m. For example, if the neighboring pixels to a connetced component are, say, on an average, 1m away, the entire connected component is likely to represent an object closer than 80cm, while they are, on an average, 3m away, the connected component likely represents a distant object (like a wall) farther than 4m away. Note that in the examples just mentioned, I’ve used the nearest neighbouring pixels’ values’ averages, since a single undefined point is unlikely.
    4. Use 2 ultrasonic sensors to get information about when an obstacle closer than 80cm to the bot exists.
  • Possible Improvements

    A few possible improvements to the obstacle avoider are:

    1. Account for the size of the object, and use are instead of centroids, so that the bot takes the path with the smallest clutter.
    2. Account for distance of objects: at present, the bot merely considers all objects within a threshhold, and ignores objects further away. However, a possible improvement would be that if there are several objects in one direction, and very few in another, the bot would take the direction of fewer obstacles (provided, of course, that there are no obstacles it has to worry about in its immediate vicinity). However, at present, the bot would just continue going in a straight line, even if that would mean more obstacles in the future.
    3. Divide the screen into more parts, so that the turning of the bot is more accurate

Now, onto SLAM!

Here are a few sub-tasks that are involved:

  • Visual Odometry We decided to use the kinect itself for odometry. For this, we would need to estimate movement about the x, y and z directions, and the rotation about the 3 axes. For this, we used the paper “Fast 6D Odometry Based on Visual Features and Depth” as reference. The first step to this would be to use a feature detector.

    Initially, we had planned to use the SIFT algorithm. For this, we tried using VLFeat. Here‘s some quick test code that we wrote to test it out (it assumes that VLFeat has been setup, as described here). Here’s what it looks like:

    Applying SIFT on 2 RGB images using VLFeat

    Applying SIFT on 2 RGB images using VLFeat

    However, SIFT is a little too slow, and it a little over a second for it to be run on an image.

    So, we tried implementing it with Matlab’s SURF functions. It seems to be much, well, cleaner, and way more efficient (taking only a little under a tenth of a second). Here‘s a MATLAB function to take 2 input images, and here’s the result:

    Applying SURF on 2 RGB images using MATLAB

    Applying SURF on 2 RGB images using MATLAB

Work Hard, plAI Harder!

Gokul and I came first for plAI this Apogee. Here are a few details:

plAI was a competition where we had to create artificially intelligent bots to collect as many resources as possible in a given environment. In addition, we had to compete against another AI bot in this task. There could be two possible ways of achieving victory:
  1. Based on the number of resources (“fish”) collected: Whichever bot collects the maximum number of fish before the time runs out would win. In the case of a tie, the bot which collected the first fish would win.
  2. Based on health: Each bot was equipped with a cannon, which could be fired to attack another bot. Getting hit by a cannon ball would cost the bot health. If one of the bots loses all its health, then it loses the game, irrespective of how many fish it has collected. Hitting land would also cost the bot/raft to lose health.
Thus, there were two aspects to the game: who can traverse the given terrain optimally and collect as many fish as possible, and an adversarial aspect: who would win in an encounter.
The bot is not given knowledge about its surrounding, but only about its immediate terrain- parameters like its current position in the map, whether any canon balls within its visibility range are flying towards it, whether there any any obstacles, enemies or fish in its immediate field of view, its current health, etc.
There were 4 primary parameters we could control: the direction of acceleration of the bot, whether or not its brakes are applied, whether or not its canon is firing and the direction at which its canon fires.
There were several other factors that we had to keep in mind, though, such as mathematical equations representing the viscosity of the water, which meant that we would always have to apply an acceleration, and the equation that gave the braking force.
Our strategy was two-fold:
  1. With regards to collecting the fish, we would make the bot traverse the terrain in a random fashion for a certain interval of time (say, 5 seconds), and if the bot remained in approximately the same region, we would then make it choose a direction at random, and continue in that direction. In case it no longer remains stuck (i.e., it was stuck in that location purely due to the random nature of traversal), then the random traversal of the terrain is resumed. In case it remains stuck however, i.e., it encounters an obstacle, it would travel parallel to that obstacle with a left arm on the wall approach, checking for cycles by keeping its starting position in mind. Thus, in a terrain with sparsely distributed obstacles, this algorithm would, in general, cause a randomized traversal of the map such that the bot doesn’t remain stuck in one location, while in the case of a map with a large number of obstacles, the bot would effectively take a left-arm-on-the-wall approach, which would be reasonably efficient in a maze-like environment, for example.
  2. With regards to firing the canon, at the start of the game, we randomly fired the canon to the diagonally opposite corner of the screen, where our opponents would likely start, hoping for some stray hits. Then, after we encountered the opponent, and they left our field of view, we would fire in the direction in which we last saw them, hoping for a few more stray hits.

There were several other minor details we had to take care of as well, such as ensuring that the bot takes the fish as soon as it sees it, irrespective of where any incoming canon balls are headed, ensuring that the bot ignore the fish if there’s an obstacle in between it and the fish, and ensuring that it ignores the enemy if there’s an obstacle between it and the enemy.

All in all, it was a ridiculously fun experience.

Software for Accessibility for the Blind- Meet 1

Dr. Sai Jagan Mohan had emailed us last week describing something he had in mind- a project which entailed discovering programs to enable the blind, especially students to efficiently use a PC, both for academic (say, reading a text or having it read to them), as well as recreational activities (Eg: Listening to music, playing games, etc), and possibly to design a program to help them learn how to use it. I got in touch with him expressing my interest, and he asked me and a few others to meet him today. There, we discussed a number of points:

    1.  OpenSource vs. Proprietary software: We discussed our perceptions of various advantages and disadvantages of both types, which include:
        • Proprietary software is generally more popular to use and is easier.
        • OpenSource software will be available much more easily, and free of cost
        • Most good OpenSource software is meant for Linux, while most PCs come with Windows by default, and Windows is generally accepted as being easier to use for a first timer

      At this point, I proposed a solution somewhere inbetween these: To use Windows Ease of Access options, Magnifier and Narrator: Not exactly OpenSource, but already bundled with Windows anyway (i.e., effectively free). We agreed on the idea, and none of us having much experience on these features, decided to try them out for ourselves.

    2. Why such software is hardly used in India : The main reason we came up with is that there is almost no awareness about them, and very few organisations and training programmes exist for the same. Further, many students come from a fairly uneducated background, and their parents don’t know how to find such information out, nor are the teachers of a visually impaired child able to guide him/her, as they themselves have little experience with such children. The solution: Talk to NGOs, in due time and have a tie up with them, that we may be able to reach out to these children with their (and possibly the government’s) help.
    1. Vernacular Languages: India, being the amazingly diverse nation that it is, has countless languages and dialects. Therefore, a difficulty a blind child is likely to face is actually understanding what the screen reader reads, not only as almost all readers are tailored to read in Western Languages (again, something we weren’t very sure of, and needed to research), but also the accent in which the narrator narrates, which might be difficult for a non-native Englih speaker to decipher. The solution? We decided that the best way of dealing with this was to teach the child English from a young age, although how the child will be able to understand the very different accent, we couldn’t decide.
    1. The difficulties they were likely to face: The main difficulty was likely to be actually get blind students accustomed to the PC, as well as to teach them English enough to omfortably ue the PC.
    1. What the students can use it for: We had in mind 2 primary goals: Recreation (something that can be enjoyed even without the sense of vision, like music, which also has songs in almost all Indian languages, and which crosses the barriers of language), as well as Education (for example, through audio books, either in English, or in their mother tongue), and perhaps something that melds the 2 together (like an educational game, perhaps).

All in all, it was a very good start, and we all agreed that our first focus should be on finding out the most convenient software for these students, in terms of both ease of access and availability, starting with the inbuilt features in Windows.