Tuesday, July 26, 2011



Getting Realtime quotes from yahoo using command line



Check out the tags that you need to send via command line and then parse the output.



Wednesday, July 6, 2011

Lua is widely used as a scripting language by game programmers and is also supported by Apple iOS

Howto Manage an EC2 Instance using PuttySSH

Launch Instance Wizard and pick the AMI for a Small Instance Type (32-bit). Enter in a name for the key you would like to create, and save it in a secure place on your local machine. We will need it for SSH login purposes later. Next setup basic firewall access for SSH (fix the web server and database firewall access later).Note the Public DNS that was assigned and save the generated key on our local machine. Amazon key ends with the extension .pem. Before we use PuTTY, we have to convert that key to PuTTY's format (.ppk) using PuTTYgen. Login with username= ec2-user, no password will be required.

Some Hadoop Topics

  • Top 10 Big Data Applications running on Hadoop Cloud Computing
  • Image processing with hadoop
  • Understanding the Shuffle Sort
  • Predictive Analytics
  • Map Reduce algorithms, the state of the art
  • Map Reduce vs Parallel Databases
  • Fully Utilizating your Hadoop Cluster
  • Mahout
  • HBase schema design and optimization
  • Big memory computing for data intensive scientific applications
  • Reasoning - When Hadoop Meets the Semantic Web
  • Hadoop 2.0 - impact of emerging new hadoop distros. Is Cloudera still relivent?
  • Using databases as input to big data processing jobs
  • Innovation needed in Hadoop to drive greater adoption
  • EMC's Big Data Stack
  • Which NoSql DB to choose?
  • Social Entrepreneurs and Impact investors: Triple Botton Line Assessments
  • Data Integration
  • Real Time Analytics using Hadoop 
  • Revolutionary Big Data Insight Engine
  • Hack proofing methods. Going beyond encryption. 
  • HBase schema design
  • Analysis of social activity using both network and content
  • Testing Big Data Technologies
  • Marrying Big Data with Advanced Analytics  (not to be given by me!  I want to learn about this)
  • Converging analytics and search using Big Data technologies.
  • hadoop pipes w/cloudera
  • Security issues with Big Data.
  • Data Analytics in Hadoop Ecosystem
  • Hive integration with HBase.
  • High Performance Virtual Database System using Hadoop/Map Reduce: Extending
  • MapReduce to RDBMS
  • Using MAHOUT and NOSQL DB over hadoop or Amazon EMR
  • How can we use hadoop with confidential/encrypted data?
  • Moving file(s) and file system legacy constructs to key/value stores to serialize unstructured pattern data and perform analytics.
  • Data Integration with HADOOP.
  • Virtual Business Ecysystem & Virtual Expo data integration
  • Data Gravity and it's effect on Public Cloud Providers
  • Analyzing customer behavior
  • Data collection with Flume
  • Use cases around Hadoop and EDW integration
  • Toughest part of building an reliable hadoop cluster.

Thursday, March 10, 2011

Youtube to MP3

Two easy steps
  1. wget "http://www.youtube.com/watch?v=cQRytgGffV4" -qO- awk '/fmt_url_map/{gsub(/[\\"]/,"\n");print}' sed -n "/^fmt_url_map/,/videoplayback/p" sed -e :a -e '$q;N;2,$D;ba' tr -d '\n' sed -e "s/\(.*\),\(.\)\{1,3\}/\1/;s/\\\//g" wget -i - -O surprise.flv
  2. ffmpeg -i suprise.flv /mnt/hgfs/Downloads/suprise.mp3

Tuesday, August 3, 2010

Installing OBIEE Server on Centos 5.5 and Admin+Presentation Services on Windows
  1. Install VMWare on Windows (VMware-player-3.1.0-261024.exe)
  2. Install VMWare tools VMwareTools-8.4.2-261024.tar
  3. Create a shared folder on Windows for Centos (available as /mnt/hgfs/ on Centos)
  4. Install Centos 5.5 using ISO file (CentOS-5.5-i386-bin-DVD.iso) -Verify that you can ssh from windows to centos using putty
  5. Install JDK v1.5.0_22 (as root) from Sun (jdk-1_5_0_22-linux-i586-rpm.bin) and set JAVA_HOME
  6. Create user oracle under group oinstall, dba and oracle on Centos
  7. Increased ulimit (ulimit -n 10240 or ulimit -n unlimited)
  8. Run the UnixChk.sh script to verify pre-requisites for install
  9. Install Oracle 10G client (10201_client_linux32.zip) as oracle user. Create all environment variables and verify tnsping works OK - ssh port forwarding 1521 and 9703 is required for putty/ssh because of firewalls.
  10. # vi /etc/sysconfig/iptables to open ports (Apparently this did not work)
    Simply add (after all of the existing entries):
    -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 1521 -j ACCEPT
    Lastly, restart iptables:
  11. libstdc++ missing so installed compat-libstdc++33.rpm using yum
  12. [oracle@localhost client]$ yum install libXp.i386 because it complained that
    Exception java.lang.UnsatisfiedLinkError: /tmp/OraInstall2010-07-29_07-26-16PM/jre/1.4.2/lib/i386/libawt.so: libXp.so.6: cannot open shared object file: No such file or directory occurred..

    the install of libXp.i386 replaced
    Installing:
    libXp i386 1.0.0-8.1.el5 base 23 k
    replacing xorg-x11-deprecated-libs.i386 6.8.2-1.EL.13.20
ORACLE_HOME=/home/oracle/orclient
export ORACLE_HOME
ORACLE_SID=ORCL
export ORACLE_SID
TNS_ADMIN=$ORACLE_HOME/network/admin
export TNS_ADMIN
PATH=$ORACLE_HOME/bin:/home/oracle/obiee/OracleBI/server/Bin:/opt/bin:$PATH
export PATH
LD_LIBRARY_PATH=$ORACLE_HOME/lib:/home/oracle/obiee/OracleBI/server/Bin:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
JAVA_HOME=/usr/java/jdk1.5.0_22
export JAVA_HOME


Create tnsnames.ora on Centos with following content; note use of localhost on Centos:
ORCL =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521)
)
)
(CONNECT_DATA = (SID = ORCL) )
)



  1. Install OBIEE (biee_linux_x86_redhat_101341.zip) as Oracle user. Install using the standalone container option which will not install the oracle application server and Enterprise Manager. Install only the BI Server, OC4J and Job Scheduler.
  2. OC4J can be manually started by running the command:
    /home/oracle/obiee/OracleBI/setup/oc4j -start. Verify localhost:9704 using firefox.
  3. Execute run-sa.sh start as oracle user. Before that source the user.sh file (for environment variables) and setenforce 0 (as root, see later) to start the BI Server.
  4. netstat -an | grep 9703 to check if BI is running. Additional tests are: OracleBI_HOME/setup, run the shell command:
    . sa-cli.sh. To test the client/server connectivity, run the command: nqcmd. If the test is successful, press the Enter key several times to quit nqcmd.
  5. Install Oracle RDBMS on Windows and start the ORCL service and the TNS listener. Create a System DSN called ORCL using the Oracle in OraDB11g_home driver. Connect to DB using SH/SH schema.
  6. Install OBIEE Presentation Services and OBI Server Admin on Windows (using Windows Binaries)
  7. Using OCI on Linux Repository trying to connect to database using OCI(recommended)
    fixed by
    If you are using OBIEE on a Linux machine that is running Security-Enhanced Linux (SELinux), you may have problems starting the Oracle client library. This is because the SELinux feature does not allow the libnnz10.so Oracle library to be loaded. You will see this error when you try to use the Linux Repository RPD file to access the Oracle DB using OCI.

    To work around this problem, you will need to disable SELinux on your system:

    * To disable SELinux temporarily on a running system, log in as root and execute the following command:

    /usr/sbin/setenforce 0

    * To disable SELinux permanently, edit the file /etc/selinux/config and change "enforcing" to "disabled".
  8. BI server can be started as /home/oracle/obiee/OracleBI/setup/run-sa.sh start
  9. On Windows bring up the OBIEE Server Admin and login using Administrator(blank password) . Port 9703 must be forwarded using ssh/putty. Work on an offline RPD and use OCI to connect using OCI10g/11g in Connection Pool on Windows. Use the ORCL SID that is running on Windows (not sure whether the ORCL DSN we created is used). Do the Oracle by Example Tutorial and save RPD file and transfer it to Centos under the Repository directory. Update the /home/oracle/obiee/OracleBI/server/Config/ NQSConfig.INI file to use this new RPD. Restart the OBIEE Server
  10. Try to now load up that file using OCI and Online mode and update row counts. This is where I originally got the libnnz10.so error that was fixed earlier.
  11. Install the Presentation Services & Oc4J on Windows
  12. Start Oc4J on windows, start the presentation services under services tab
  13. Rest to follow

Wednesday, July 7, 2010

Cornucopia of Technologies


Here is a list of all technologies that I heard of at the Hadoop Conference. Now I need to figure out how they all fit together.



Hypertable,Cassandra, HBase,BigTable
Mahout,ZooKeeper,ElephantBird
Cascading
Flume,Thrift,Crane,Scribe, Scoop
Pig,Hive,
Workflow=Oozie or Cloudera Desktop
HDFS, GFS
Lamport Papers
Protocol Buffers for data serialization format
Social Graph Analysis

Companies: GreenPlum, Asterdata, Cloudera,CStore,MonetDB

Distributed Hash tables
Complex Event Processing
Consistency Vs Availability vs Partition Tolerance (CAP)
AB Analysis