Saturday, April 5, 2014

Book Review: Agile Data Science



Agile Data Science is an instant starter guide for all things related to big data and is aimed at beginners in this technological field. It attempts to explain the technologies associated with bid data in a very simplified manner, which can be helpful to first timers but quite confusing at implementation level - mainly due to the lack of space for the topics dealt with this book.
The first half of the book talks about theory and setup of tools that can easily be setup for a data analysis project and the remaining half of the book goes into its creation.

The main point of contention I felt was the way the book shifted gears- the first section was comprehensible enough but upon moving to the second section, the level of details set in quickly, making the contents hard to follow and I ended up using the available source code.

It first starts with explaining the big data and cloud technologies prevalent today and then introduces the user towards Data, cloud and various handy tools to utilize them.
The next part contains an end to end application that mines and provides analytics for emails - a very real world implementation which is covered in various aspects.

However, as open source projects keep on changing rapidly, the relevance/best practices followed by the example is questionable. This is also important as various startups follow their individual app stacks while addressing the big data challenges.

As a personal note, I would not recommend this to a learner as a person is better off developing in bits and pieces - various tutorials over the internet serve this better. Only if you need an insight into how things are carried out to meet big data challenges using opensource technologies is this text useful.

Note: I've been provided a copy of this book under the OReilly's blogger review program.

Saturday, March 22, 2014

2000th Title Campaign by Packt Publishing


Great news for book readers!
 
Packt Publishing has launched an exciting campaign to coincide with the release of our 2000th title. During this offer Packt is giving its reader a chance to dive into their comprehensive catalog and Buy One, Get One Free across their entire range of eBooks.

The campaign begins on 18th-Mar-2014 and will continue up until 26th-Mar-2014. Following are the benefits readers can avail during this campaign.

  •     Unlimited purchases during the offer period
  •     Offer is automatically applied at checkout


You can start this by visiting this page:  bit.ly/1j26nPN

Tuesday, March 18, 2014

8 new features for Java 8

Jdk 1.8 aka, Java 8 is launched today meaning that the General Availability release of it is out in the open and developers can switch from Early Release releases to a tested release for production use. But what does it means for you, the busy Java developer? Well, here are some points that I condensed to mark this release:

1.Lamda Expressions

I started with lambda expressions as this is probably the most sought after feature in the language after probably Generics/Annotations in Java 5.
Here's the syntax:

(argtype arg...) -> { return some expression.. probably using these arguments }

What it does is that it reduces the code where it is obvious, such as in an anonymous innerclass. (Swing action handlers just got sexy, yay!)
So, a thread can be changed as:
Runnable oldRunner = new Runnable(){
    public void run(){
        System.out.println("I am running");
    }
};
Runnable java8Runner = () ->{
    System.out.println("I am running");
};

Similar to Scala, type inference is also possible in Lambdas. Consider the following available example:
Comparator c = (a, b) -> Integer.compare(a.length(), b.length());

Here, the types of a,b (In this case String, from the Comparator interface) are inferred as the compare method is implemented.

The symbol used to separate the block from arguments, -> is quite similar to => already used in Scala and if you are good at it, there is not much reason to switch as you will feel the way lambdas are implemented in java is inadequate(and verbose), but for a good 'ol java programmer, this is the way to go.

2.Generic Type changes and improvements

Taking clues from Lambdas, generic collections can also infer the data types to be used to an extent. The methods for instance using a generic collection need not specify genric types. Hence, the following method
SomeClass.method();
Can be called simply ignoring the type information:
SomeClass.method();
The type can be inferred by the method signature, which is helpful in nested calls like myCollection.sort().removeUseless().beautify();


3. Stream Collection Types(java.util.stream)

A stream is a iterator that allows a single run over the collection it is called on. Along with Lambdas, this is another noteworthy feature to watch out for. You can use streams to perform functional operations like filer or map/reduce over collections which can be streamed as individual elements using Stream objects. Streams can run sequentially or parallely as desired. The parallel mode makes use of fork/join framework and can leverage power of multiple cores.
Example:
List guys = list.getStream.collect(Collectors.toList())
can also be implemented parallely as
List guys = list.getStream.parallel().collect(Collectors.toList())

Another nice example that reduces the collection to a single item is by calling reduce algorithem.
int sum = numberList.stream().reduce(0, (x, y) -> x+y);
or,
int sum = numberList.stream().reduce(0, Integer::sum);


4. Functional Interfaces (java.util.function)

These interfaces contain some default methods which need not be implemented and can run directly from the interface. This helps with existing code - changing interfaces need not make all the classes implementing it implement new methods. This is similar to Traits in Scala and functional interfaces will be compatible with lambdas.

5. Nashorn - The Node.js on JVM

This is the javascript engine that enables us to run javascript to run on a  jvm. It is similar to the V8 engine provided by chrome over which Node.js runs. It is compatible with Node.js applications while also allowing actual Java libraries to be called by the javascript code running on server. This is exciting to say at the least as it marries scalability and asynchronous nature of Node.js with safe and widespread server side Java middleware directly.


6. Date/Time changes (java.time)

http://download.java.net/jdk8/docs/api/java/time/package-summary.html
The Date/Time API is moved to java.time package and Joda time format is followed. Another goodie is that most classes are Threadsafe and immutable.

7. Type Annotations

Now annotations can be used to decorate generic types itself.
Eg: List<@Nullable String> which is not desired always, but can prove to be useful in certain circumstances. Apart from decorating Generic types, it can also be used in constructors and casting.

  new @NonEmpty @Readonly List(myNonEmptyStringSet)
  new @Interned MyObject()

  myString = (@NonNull String) myObject;
Even the array objects can be annoted:
  @NotNull String[] arr;

The inclusion of RuntimeVisibleTypeAnnotations and RuntimeInvisibleTypeAnnotations attributes which cause the .class file to save the annotation information.

8.Other - (nice to have) Changes

Reflection api is slightly increased with the support of TypeName, GenericString, etc.
String.join() method is a welcome addition as a lot of self created utility classes are created instead. So, the following example
String abc= String.join(" ", "Java", "8");
Will get evaluated as "Java 8".
In the Collections package, the Comparator interface is revamped and methods like reversed, comparing and thenCOmparing have been added which allow easy customization of comparison over multiple fields. Other libraries like the Concurrency and NIO have also been updated but is nothing noteworthy for following up and is keeping with the changes in the api.


Overall, Java8 is well thought of and is making mainstream java concise and picking some good parts of Scala/Clojure for the improving its syntax and addressing much sought features.

Monday, March 17, 2014

Mocking Express.js servers in Node.js applications

Today I came across Shmock, which is Express.js HTTP server mocking library which is handy for various node.js applications.
It is better than other mocking libraries out there as it relies on creation of an actual Express server instead of mocking the http client, which easily leads to brittle tests.
The usage is completely present on the github page for this tool and can easily be used with testing tools like Mocha.

When polyglot persistence saves the day

I hope you find this read interesting and you are in a good mood like me as today was Holi, the great Indian festival of colors. However, at work the same fun was not there as there was a rush job for fixing an error in some codebase, pufff. It was a python project that was calling rets server to fetch information from different properties and looping within. MongoDB temporary tables were created under this as the data was first getting stored in them before saving it in the actual MongoDB data collection, which was causing the entire server disk space to be used. What optimization I did was to use Redis lists in place of this arrangement which reduced the space requirements and removal of data from redis bears lesser side effects than doing that in temporary collection which gets saved to hard disk on every write operation.
However, this means that If I am using polyglot persistence in my project (as this term is known), then the amount of code complexity to deal with the different data stores increases.
My work on the project remains unfinished as I have to nail getting the proper  RETS server data (using their customized querying mechanism) and also find out why and how the existing data was wrongly extracted.