Saturday, February 18, 2012

Scraping Data: Getting only what you need

     It has happened several times that I have visited Wikipedia home page just to look into today's history. But, then there is so much of information, that you simply get distracted, don't you? So I decided how about, getting just what I needed? I had once stumbled upon a CS 50 video on Scraping data from internet, hence I had a basic idea as to what to do.
    But I googled a bit, coz I wanted to get something which was more recent and something that made my work very easy. Thankfully I stumbled upon, simplehtmldom parser . It made scraping of data so much easier. You don't even need to worry about completeness of the code! I just had to go through the HTML of the page and identify where the information was placed. Well, here is the code that I used to get my information.


<?php
  include_once('simple_html_dom.php');
  $html = file_get_html("http://en.wikipedia.org/wiki/Main_Page");
 echo $html->find('p', 4)->innertext; // Event happening today e.g Independence day
$ret['All'] = $html->find('ul', 5)->innertext; // Event happened in the past
echo $ret['All'];
   $html->clear();
   unset($html);
?>


I have used $ret['All'] just to show you that the output can be stored in another array.
      There is one fault in this code. If there is no event happening today, then the fourth 'p' tag would contain data of some other section of the page. Hence, in order to scrape the data in this manner, you need to hope that the page's html does not change. I have used only some of the features of the simplehtmldom parser, this manual will show you more applications of the library. Happy Hacking!
      If I get a chance I will make a twitter bot, that gets this info and tweets it everyday. Will need to setup some cron jobs for that and will need to learn several other things, but that will be done later.
      


  

Tuesday, February 7, 2012

TT (newer version of timetable)

  Last year I had created a shell script which showed me my academic timetable for the day. I was a newbie in shell scripting at that time and hence the script was not so good. Hence, here is tt or I should say timetable2.0 ?
  This time I have stored my timetable in a timetable.txt file so that whenever I need to modify the timetable, I don't need to modify the script. Also, the code is much shorter this time with not so many if statements. Here is the new code:

#!/bin/bash
days=( Monday Tuesday Wednesday Thursday Friday )

if [ "$#" == 0 ]
then
  echo -e "\nToday's Timetable:"
  set `date +%u`
else
  echo -e "\n"${days[$1-1]}"'s timetable:"
fi
  sed -n $1p < timetable.txt
  echo -e "\n\n"


  Still to be added: Feature to change the timetable. But that's all for now. The new modification will come when I feel the mood of it :)

  

Tuesday, January 24, 2012

Facebook Hacker Cup

    The Facebook Hacker Cup qualification round results came out yesterday night. I had attempted 2 out of 3 questions and both of them came out correct. Yay! I am actually lucky to have solved 2 questions as I was procrastinating about solving another question after I submitted the first one. You need to submit only one correct solution to get qualified further. This procrastination cost me in terms of time penalty but we can ignore that.
    Here is the list of questions: https://www.facebook.com/hackercup/problems.php?pid=215823855164332&round=146094915502528
   As only 22 people all over the world solved 3 questions, I am cool with my result :)
 
   Now the round one starts at 10 am PT on 28th Jan which means it will start at 11:30pm India time :'( I can never work at night :(

  

Wednesday, January 11, 2012

Installing Linux (Ubuntu 11.10) on Virtual Box

   Hey all! Here's a post on how to install Linux on a virtual box. I will be giving the demo on a Window 7 machine, using Virtual Box version 4.0.10 and will be installing Ubuntu 11.10 on it. So, let's start!

First step: Getting Virtual Box
You can download it from here: https://www.virtualbox.org/wiki/Downloads It is a  simple exe, just run it and you have your virtual box ready!

Next: Get Ubuntu

The direct download link is this: http://www.ubuntu.com/download/ubuntu/download .  Make sure you select 32bit version for your 32 bit computer and 64bit version for you 64 bit computer.

Next:Installation
Start your virtual box and click on  new Machine icon.

                                     
It will open up a "Create new Virtual Machine" Wizard.
Click next and then it will ask you to give name to the machine. I entered Ubuntu and it automatically set the Operating System to Linux and version to Ubuntu. You can set them manually as well.
                         

Click on next and it will ask you to give base memory. I gave 1GB but it is upto you. For Linux 512MB to 1GB is good enough. Click Next.
Now you are at "create a boot hard disk". As this is your first installation, you do not need to make any changes and just click on next.



Now it will open up a "Create New Virtual Disk" wizard. Click next.
Here you have an option to create dynamically expanding storage or Fixed size storage. It is best to use dynamically increasing storage. Moving on,
It will ask a location to store your new virtual disk. Select the location of your choice. Then in the size part, select the disk size of your choice. The default 8 GB will be enough if you do not want to install much of applications on your Ubuntu otherwise, select 4 or so GB extra.Click Next
Click on Finish and it will give you a summary of your new machine. Click on Finish or go back to make necessary changes.
                                            
Now the installation part is half complete.

Next: Configuration
Just a little time over here.
                                         
Select the new machine you created, go to settings. Now go to Storage and click on IDE controller. Now in the attributes section, click on the small disk icon. Select the copy of Ubuntu that you downloaded and click on OK. That is all the configuration you will need.


Now click on the start Machine Icon.
At the first run, Virtual Machine may ask a few configuration questions. Just read them and take necessary steps. It is very easy to do so!
Once Ubuntu is booted, it will ask whether you want to install or just try Ubuntu. It is your choice what to do. Though, if you do not install Ubuntu, all the work you do on the machine, will be lost, the next time you boot it. Hence, it is better to install it.

Again, installation will ask you some questions, and again, they are very simple It will ask you whether to install updates with installation or after it and so on. Installation will take some time. After that, you are all set to play with your Virtual Machine!
  

Monday, January 9, 2012

Infinite Redirect and effects on browser

   Its always great to do something that messes your computer! How about doing something that makes your browser take up 50% of your CPU!
  Javascript has window.location object, generally used for redirecting your page.
 e.g. window.location = "http://www.google.com" redirects you to Google homepage.
 What would happen if you redirected the page to itself! An infinite Reload! Just to see how the browsers behave when this happens, I created an html page with a javascript code redirecting it to itself. As expected the result was an infinte reload happened and the browsers gulped 50% of my processor! I was hoping the usage would go up but it didn't :(
  I was wondering if there was any way, that the browsers could stop this from happening.I found 2 solutions:

1.The browsers can keep a count of number of reloads of a page and on reaching a certain number, the browser can simply kill the page or stop it from refreshing. But then, sites keeping live scores tend to keep on refreshing and you wouldn't like it when the site stops loading at a crucial moment in the game! The solution to this problem could be browsers keeping a count of average time between 2 reloads. If the time is smaller than some smaller certain amount, the page can be stopped from reloading.

2. Web browsers get the javascript code right? So, they can simply ignore the line where the page redirects to itself, if the code line is not within some function. Maybe the

  

Thursday, December 15, 2011

5 hours @ ICPC Asia Regionals Amritapuri

  At 8:30 am On 12th December, Aditya Prajapati, Arth Patel and myself or in other words Team Zion wwere sitting in the Computer Lab of Amrita University waitingo for the ACM ICPC Asia Regionals to start.
  Going back 2 days, 10th December morning, we were completing our final examination of the 5th Semester. Just after the exam we rushed to the printers for our tickets, then went to railway station to catch a 4pm train to Mumbai then a 5am flight to Kochi and finally a car ride to the campus. We were exhausted but we had a night to partially recharge our batteries.
  Anyways, the contest time, we idiotically didnot get Kevin sir's "notebook" printed and that was our first of several mistakes to come. We were without practice anyways. Then the questions arrived. I read question A, Aditya read question B and Arth was reading question E. I realised question A was the 2D version of one of the online round questions and hence found it easy to code. By that time Arth had already started coding for question E. He made the code and ran it for some test cases, working! We thought, yeah! He submitted it! Wrong Answer! There it was staring at our faces! Then Aditya Jumped in and realised Arth had read the question wrong! I then saw that question G was being submitted pretty quickly so we went for it and submitted it at 27 or so minutes. Relief! Then I returned to testing the algorithm for A and Arth and Aditya tried to find a proper algorithm for E. We all spent more than an hour just trying to figure out the algorithm! Then I finally showed Aditya my algorithm, we then coded and submitted it. Wrong Answer! Aw man! Arth then read the question E again and then realised that they were reading it wrong entirely! The reaction that we had after Arth said it! We had spent 2.5 hours trying to solve a question that we had read wrong! We finally coded it properly and submitted it it finally was correct! Meanwhile I had found the trivial case for my algorithm in A and had moved to problem J as Arth was coding for E.
     The question seemed very simple to solve but turned out all our optimization algorithms were failing and we had 5 TLE submissions! The last one was 2 seconds before the end! At the end we ended up being 94th after solving 2 questions. We were not happy but the bigger realization dawned next! We had not submitted question A! I had found out the trivial case and made the necessary change! But since we were to occupied with E we made A wait and rest is history! Angry at ourselves we got out of the room, sulked for some time but then the closing ceremony started in came C J Hwang and we forgot about our performance!
    All in all we had a fun time at ICPC Amritapuri but next time we may go to Coimbatore or Kanpur for a change!

  

Monday, November 7, 2011

An afternoon with canvas

   Today I returned to my hostel after a 2 week long Diwali Break. Well, since it was Diwali it was obvious that I wouldnot be doing a single bit of coding and I am not complaining, but the last 3 days at home were free so I thought of getting in the mood slowly. So I picked up an old painting application that i had developed.
   The previous one was a simple jQuery based application and did not have a feature to save your work so all the Davincis and Picassos out there could not show others their creations. So I thought of making the same app but on HTML Canvas element as it was very easy to save what you did!
   This time again it was very easy to make the JQuery code that was used to paint. While it took me a long time setting up the positioning of the objects.Let me share some code needed to get the canvas up and running.

HTML: 
<canvas id="can" width="500" height="500">

CSS:
#can
{
   border:1px solid black;
}


JQuery/Javascript
var dodraw = false;


var size=15;

var canvas = document.getElementById("can");

var ctx = canvas.getContext('2d');

ctx.StrokeStyle="blue";

ctx.lineCap="round";

$("#can").mousedown(function(e){dodraw=true;





ctx.lineWidth=size;

ctx.beginPath();

ctx.moveTo(e.PageX,e.PageY);

ctx.lineTo(e.PageX-1,e.PageY-1);

ctx.stroke();}});




$("#can").mousedown( dodraw=false;)






$("#can").mouseover(function(e){
   if(dodraw==true)
  {
      ctx.lineWidth=size;
      ctx.beginPath();
      ctx.moveTo(e.PageX,e.PageY);
      ctx.lineTo(e.PageX-1,e.PageY-1);
      ctx.stroke();
  }});

This code should be enough to get your canvas be able to draw stuff. You can further add features and modify the code as required.
I have currently added eraser,brush,save, new and size +/- buttons to the page.
The demo is available here:Open World