Tuesday, December 30, 2014

Book Review: Learning Python Testing by Daniel Arbuckle

This is a review of Learning Python Testing by Daniel Arbuckle which as its name suggests, delves into testing with python. The testing frameworks covered are already covered in comprehensive python texts but instead of only high level overview, this textbook covers various testing frameworks and approaches in detailed manner.
The first chapter covers various forms of testing - something which could be covered in few subtopics and test driven development could have been discussed in greater detail. It then moves into doctest, which is unique approach in python and was discussed in depth - I've only known it vaguely before and the provided details opened up a sea of possibilities for me. For instance, the AVL tree example provided an excellent demonstration of doctest and is worth quoting. The next chapter provided unit tests with doctest and correctly explained the essence of unit testing. The PID controller example however was confusing and it took me frequently eyeballing the code - the logic used was complicated and consequently the test cases were larger. Next, mock objects were detailed - which are not present in other python texts. unittest.mock library in python3 was explained in depth before covering the details of this module. In the next chapter, the Nose test runner was discussed and its various customizations were also explained.
Test Driven Development was introduced relatively late in the book after all the testing frameworks were covered and covered writing specification and tests before code. For someone new to TDD philosophy, this book offers excellent introduction and covers all its facets and challenging the reader to do more - which I found as a pleasant surprise. This was followed by integration and system testing doctrine. The final chapter covering importance of testing in code coverage, version control and continuous integration wrapped up this handy textbook sufficiently.

Apart from some places where I found the text opinionated with the example taken, I enjoyed the rest of the book and specially places where the reader is asked to get off reading and try things to test his/her knowledge of the topic.

Disclaimer: I received a copy of this book[http://bit.ly/1HKQaIj] for doing its review.

Thursday, December 25, 2014

Book Review: Think Stats

Think Stats by Allen B. Downey

Think Stats presents statistical analysis techniques with a twist - by showing how to code the solution in python rather than applying mathematically. There may be other books on data analysis using various techniques and python libraries, but this book is unique in the sense that it teaches the reader to apply statistics over real world data and answer common questions pertaining to it and one can even feel the author guiding through various exercises. Another takeaway from the book is that it also shows real world applicability- for instance, having empathy and gratitude for people providing personal information used in the creation of the dataset as well as understanding the context of data before applying any algorithm.
Like reviewing other books, I assumed that the potential reader would be new towards statistics and computing statistical methods in general and this book justifies itself by encouraging the reader identify problems and apply relevant algorithms for the same. While I was able to follow the chapters given in the book, the sole use of python was a bit of a concern - if you are comfortable in python, you will feel right at home otherwise the use of specific technology might cause slight disruptions. I prefer javascript based examples over NumPy and SciPy these days, but as these libraries are more mature, It serves as an excellent choice.

Aiming to replace your regular statistical class, all the key topics are presented in a refreshing manner. After discussing data analysis exploration, representation of data is demonstrated using basic methods before proceeding towards more complex ones. Advanced topics like regression and analytic functions are introduced towards the end. Here's the list of chapters in detail.
One issue that I faced was the fact the book went into theory and actual code analysis was missing at places. For instance, I was unfamiliar with pandas DataFrame and it required me to have a look into the code during the course of the book. This book would definitely in my references list for times to come.

Disclaimer: I have been provided a free copy of this book by OReilly under their Blogger review program.

Saturday, December 20, 2014

Packt Ebook Bonanza Campaign

I came across this promotional offer from packt publications for this festive season:

Following the success of last year’s festive offer, Packt Publishing will be celebrating the Holiday season with an even bigger $5 offer.
From Thursday 18th December, every eBook and video will be available on the publisher’s website for just $5. Customers are invited to purchase as many as they like before the offer ends on Tuesday January 6th, making it the perfect opportunity to try something new or to take your skills to the next
level as 2015 begins.
With all $5 products available in a range of formats and DRM-free, customers will find great value content delivered exactly how they want it across Packt’s website this Xmas and New Year.

 To avail this offer, visit: http://bit.ly/p5dlr
Or see twitter hashtag #packt5dollar

Sunday, December 7, 2014

D3.js Tips and Tricks: Using Plunker for development and hosting

D3.js Tips and Tricks: Using Plunker for development and hosting your D3.js visualizations 

I've just been finishing up with a D3 visualization pet project and found this free and amazing hosting service to showcase your product to the world. Hope this helps others too.


Friday, November 21, 2014

Custom Java Metrics in Eclipse

Recently, a relative inquired if I could provide a tool to calculate custom metrics for java code, so I began search and found an old, but open source plugin for eclipse that does just exactly that- calculate metrics without doing any fluffy stuff.
I took its source code and provided it on github.


I had to do minor changes to make it working quickly, which was to use only the metrics table prospective and hide the views not in use.

Here, the metrics need to be provided with 2 essential things:
  1. The name of metric in the plugin.xml file (which is read by the plugin in the list of metrics to calculate)
  2. Name of the calculator file, which contains the logic used in calculating a specific metric value

The required custom metrics were outlined in a paper that discussed about quality of object oriented design and was academic in nature, so the nature of metrics created is quite simple.

For instance, the metric 'Inheritance Ratio' took the form of following tag in plugin.xml

    name="Inheritance Ratio"
    id="INHR"
    level="type">


and also mapped its logic via the calculator tag:
    name="Inheritance Ratio"
    calculatorClass="net.sourceforge.metrics.calculators.InheritanceRatio"
    level="type">



Next, there is need of simply creating this calculator file and overriding the calculate method. Another thing is to create constants like INHR (in this example) by which the metric and its calculator may communicate during runtime.

So, by piggybacking on this application, I was easily able to provide the metrics tool without worrying anything about the UI and source code management. I plan to finish this quickly and then improve it as per suggestions, which are always welcome!
https://github.com/SumitBisht/Metrics

Tuesday, November 11, 2014

Book Review: High Performance Python

High Performance Python by Micha Gorelick and Ian Ozsvald

To a causal user of python, this book might not seem interesting or handful, but if one is working on an application that desires the maximum effeciency out of computer, no matter how the code is structured, there is a need to go deeper and find out how the code actually works and this starts from knowing how to best write the code that ensures minimum of cpu cycles and ram.
The book starts with an overview of the various data structures and libraries that python offers as well as detailing the internal structure of various computer architecture. It then moves on to profiling and detecting bottlenecks both via memory and cpu cycles. In sucessive chapters, this is further explained as different forms of data structures are discussed. Similarly the performance gains provided by third party libraries like NumPy are discussed in detail, which is a learning experience. The chapter on compilers is also informative - the benefits of compilation depending upon which compiler is chosen. Important concepts like concurrency and multiprocessing are discussed in detail for the I/O and CPU bound problems.
Overall this is a book that you would like to keep in handy as a reference while designing a performant architecture or improving a python based application's response time. For me, this was an eye opener and a reckoner for improving the designg and effeciency of a python application and many concepts were beyond me at times, but I hope to learn more about python internals and this is a book that I will visit in future - to confirm my findings and keep on learning deeper concepts.
Disclaimer: I have been provided a free copy of this book by OReilly under their Blogger review program.

Wednesday, October 29, 2014

Using Hybris for e-commerce

Recently, I had a chance to work on Hybris, which is a private company (now acquired by SAP) offering an application framework with the same name, which is a customizable platform for handling e-commerce B2B and B2C needs. While I was initially skeptical of the framework in the sense that it is not apparently clear how to create an online store like Magento or Spree, but is much more than a CMS that creates a shopping front-end for the user.
Technically, hybris is a framework built on top of spring that runs on a customized tomcat or SpringSource DM Server and uses maven for build automation. While this architecture is carefully thought out, the problem lies with its openness- the community is quite limited and as learners, we are restricted to the hybris wiki and forums. Given the evolving nature of the application (different versions come frequent and fast), opening it up would make a lot of sense. As SAP recently acquired Hybris, it may provide integration with its tools and databases in future as well as provide forums and community support like other mainstream software.

Coming back to uniqueness in this framework, here are a few notable observations:
Open-Close model: While the software is not open source, the way of creating application on top of framework comes with least surprises and there is a lot of flexibility.
User oriented: Cockpits are specialized interfaces that power users/admins of the software can use to quickly access the information present in their application.
Thought out: Like a matured product, entire gamut of e-commerce application is present and one can not only provide the http based web solution but also plugin with an existing application.
Scalable: As jvm based approach is followed, it is quite scalable - though it requires an upfront resource allocation during development, it sure pays back to ensure the scalability of the website.
Performance: The customization comes at a cost of computing resources, as well as slow/complex process workflows. The overall performance of the application if compared against other ecommerce solutions is worse.
XML based: The configuration is mainly in xml and parts like the UI is built on custom framework, which is pretty restrictive. While these might not be the issues, but they stuck out like a sore thumb to me, which brings us to another conclusion discussed in the next point.
Old standard Architecture: Application development seems archaic on this framework as the choice of the technologies used as well as provided seems to be a decade old - and regardless of the things mentioned on the website, integration with popular technologies is quite hard.

As I continue my exploration into this framework while working on real world projects, the things gained so far would help me to compare this against both open-source and corporate alternatives and enable me to contrast the differences more accurately against each of them.


Thursday, September 4, 2014

Is it really 'Team Agility' ?

Today, as enterprises continue to face increasing competition from freelancers and startups as increased cloud adoption is causing disruptive business opportunities and challenges for everyone. It is not surprising to see the enterprises going for more agile processes.
However, there is a world of difference between large enterprises and small units of tightly cohesive teams that are set out to achieve software agility. In a recent conference that I went to, this was echoed as 'culture' that no tool or technology could change - as it came from within. A lot of agile projects have not reached their promised outcome as the culture got in the way of their implementations. My personal observation is that apart from team culture, individual behavior and understanding has also got in no small way to do its part in making changes into the outcome. The culture is merely the result or the sum of individual behavior and as micro ecosystem influences the macro one, the individual behavior plays a compounding role in the business of being agile and at the end of the day delivering quality software.
I have worked across different teams during my experience as a software craftsman and the major part that facilitated output was not only the tools and automation in place, but also how the team got on together as a unit, without having egos and finger pointing out at one another. In a larger enterprise, that is rarely the case and my main point here is to reflect upon the team composition as in most projects, there is a management team that handles teams working across different timezones and also managing the bench strength. Unfortunately, this leaves a large room behind for miscommunication which is resolved through process aka documentation that kind of defeats the purpose of being agile in the first place.
What is needed is to have small teams that are oriented towards individual results rather than team efforts(this might be contradictory to the management version of a team but will lead to transparency) instead of large monolithic units ('Lines of Businesses' as they are called) and have cross-functional lines of management between them (to facilitate agility in different sprints) instead of being top-heavy. It is easy by major software tool providers to sell tools to add in agile processes, DevOps, automation, etc. but at the end of the day these are just some buzzwords that probably make your client satisfied about the agility and the appearance of a leaner process when apparently there isn't any.
So instead of merely claiming to have a startup culture, both the software providers as well as clients need to own up to the fact that to have a faster and a cheaper software delivery model, the team size and work needs to be more transparent rather than process heavy which allows for both incidental and malicious efforts to cause inefficiency.

Thursday, July 17, 2014

Book Review: 'Real Time Communication with WebRTC'

I am presenting the review of the book, 'Real Time Communication with WebRTC' by Salvatore Loreto and Simon Pietro Romano.
This book is an interesting mix of both Theory and Practical components of WebRTC, better explained to a layman as Voip or skype within a browser.
One of the things that could really affect you as a reader is the way this book is written - the theoritical fundamentals are interspread with code and practical advices. At times this makes a seemingly straightforward topic like socket.io painful to understand, but this is quite handy if you are stuck at a specific problem and need to go deeper into it.
As I am already having experience in developing applications that utilize WebRTC, it was a refreshing read that also explained a lot of theory details associated with this technology and the various possible ways in which peer-to-peer audio and video can be shared on a real time basis.
To give you an overview of this book, it gives a long introduction to the users and handling of user media(mic and webcam) from a HTML5 browser, before starting the discussion of the different design strategies used in a peer-to-peer connections. It then runs the user through an application from scratch to increase his confidence over the topics discussed and finishes with a discussion of advanced features of WebRTC API.
My greatest peeve in using this book was the lack of authority in the examples - some examples failed to execute with the firefox browser. Also, some of the routinely occuring errors could have been added as this technology is constantly evolving and it is not unexpected to find some code that might not be supported by future versions of browsers.
However, the browser based peer-to-peer communication is completely discussed and this book is one of the most comprehensive text on it at the moment.
Disclaimer: I have been provided a free copy of this book by OReilly under their Blogger review program.

Tuesday, July 1, 2014

Packt Ten Year Celebration Campaign - Packt Publishing

Packt Publishing launches an exciting campaign to celebrate 10 years and is offering all eBooks and Videos at just $10 each for 10 days (Till 5th of July). 

This publication has been a boon for open source frameworks - by providing a well formed additional documentation/how-tos for a specific technology. 


Press Release:

Packt’s celebrates 10 years with a special $10 offer

This month marks 10 years since Packt Publishing embarked on its mission to deliver effective 
learning and information services to IT professionals. In that time it’s published over 2000 titles and helped projects become household names, awarding over $400,000 through its Open Source Project Royalty Scheme.

To celebrate this huge milestone, from June 26th $10 each for 10 days – this promotion covers every title and customers can stock up on as many copies as they like until July 5th Dave Maclean, Managing Director explains ‘From our very first book published back in 2004, we’ve always focused on giving IT professionals the actionable knowledge they need to get the job done. 

As we look forward to the next 10 years, everything we do here at Packt will focus on helping those IT professionals, and the wider world, put software to work in innovative new ways. 
We’re very excited to take our customers on this new journey with us, and we would like to thank them for coming this far with this special 10-day celebration, when we’ll be opening up our comprehensive range of titles for $10 each. 

If you’ve already tried a Packt title in the past, you’ll know this is a great opportunity to explore what’s new and maintain your personal and professional development. If you’re new to Packt, then now is the time to try our extensive range – we’re confident that in our 2000+ titles you’ll find the
knowledge you really need , whether that’s specific learning on an emerging technology or the key skills to keep you ahead of the competition in more established tech.’ 

More information is available at www.packtpub.com/10years 


bit.ly/1ohwJwx

Wednesday, June 25, 2014

When fast is merely not good enough

In the pursuit of applications that are having ever increasing speeds, I am constantly looking around for new ideas and have come across an interesting term, reactive programming that addresses a lot of concepts and puts the possible answer into a single umbrella; or quite simply speaking, gives it a name.
Basically, any web application under this umbrella term is people first - meaning it will inform its client what is happening instead of a delayed page load that can take anywhere between 2 seconds and hours. It always provides a real-time response to the client and responds to clients, events, load and failure.

This is typically done by making the application use following characteristics:
  1. Responsive
  2. Scalable
  3. Resilient
  4. Event-Driven

When these applications are cohesively applied, the common pattern/word that emerges gets labelled under the term reactive programming, which is promised by the http://www.reactivemanifesto.org/

Thursday, June 5, 2014

Book Review: Client Server Web Apps with Javascript and Java



The book, 'Client server Web Apps with Javascript and Java' by Casimir Saternos aply provides its puchline, 'Rich, Scalable and Restful'. These words do not only cover the essence of this book, but also describe the adoption of Javascript based frameworks and technologies on the user-interface/frontend of today's Enterprise Java applications.

A new term is used to introduce the users - Client-Server, which signifies that client side of an application is as important as its server side and the amount of programming efforts required on the client side is also as big as it is managed on the server. It is similar to the other topics that are introduced in this book - completely from scratch, which enforce learning familiar concepts like JavaScript refreshing to learn.
Even for an experienced developer, there are lot of things to watch out for like in chapter 2 where excerpts from 'Javascript: The Good Part' by Douglas Crockford are cited for concise learning. Similarly, in the next chapter detailing REST and JSON, the non-existence of url/syndication in JSON and its related debate surrounding HATEOAS (Hypermedia As The Engine Of Application State) is explained. The JVM specific languages are mentioned by highlighting build tools related to them, which is potentially confusing if the person reading is not familiar with build, version and test tools.
The next part of the book starting with Chapter 5 deals with the client side web application and quickly introduces the user towards finer points like asset pipelining and is followed up on the next chapter by introducing different JVM based servers to run and deploy the web application. Lightweight Java servers and developer productivity tools are listed in the couple of following chapters, which I think do not add much value to the overall premise of the book. The next chapter then covers the design and principals of RESTful web services and demonstrates one created in Jersy which is then followed up by jQuery.

However, the Chapter 10 covers Angular and Sinatra (a mini-web framework in ruby) which is a let down as Java8 has provided native node.js runtime through project Rhino and it would have been interesting to see angular being used in the full MEAN stack (MongoDB, Express.js, Angular.js and Node.js) as express.js is the De-facto framework in the node.js land and angular and express share quite a lot in common. That said, beginners to Angular should use this chapter to get a feel of angular and do not worry much about the choice of server side framework used to provide the RESTful service to the client side application in question. This chapter covers Angular.js in sufficient detail and covers the actual theme of the book but at the end of the chapter, as a user I am left wanting for more - especially given the multitude of client side frameworks available as of today. I will definitely keep an eye open for improvements in this chapter in the future revisions of this book.

The final three chapters deal with the packaging and deployment and touches these areas briefly - it covers just the starting pointers and the users can themselves choose the tools to learn further as they need more.
Another plus I found with this book was a well balanced Appendix - on one hand, practical examples on using different lightweight databases were given and on the other hand, various facts and trivia regarding REST was detailed.

Overall, this book is a gem of knowledge to existing/new programmers who are starting looking into the exciting world of client side javascript based webapps that interact mainly with lightweight web services.
For a few sections where a simple Java based Restful service is demonstrated, sinatra running on jRuby is created which is fine, but can potentially confuse some java programmers who are not familiar with the ruby/jruby landscape. Instead, some offshoot library of Sinatra created in Java could've been used. This apart from the smaller chapter 10 and the fact that any other client side framework and tools like grunt/bower have no mention is my main grouse from the otherwise stellar book that deserves a read for those who are starting up on new age web apps.

Disclaimer: I have been provided a free copy of this book by OReilly under their Blogger review program.

Tuesday, May 20, 2014

Recording both Audio and Video Streams from browser


This post is about the recent project that I am working on and details the approach that I have taken to circumvent an apparant problem.
I have been trying to create a web based audio+video capture tool that was using HTML5 webrtc based libraries to capture information from the user if it has a webcam and a mic. To do this task, the most robust library is RecordRTC (https://github.com/muaz-khan/WebRTC-Experiment) by Muaz Khan, which contains various examples and is actively maintained and supported by its creater.
For my requirements, I needed both audio and video streams to record at the same time, which was not coming from the usermedia object and the following code only provided a single stream of information:



I was thinking that selecting both the audio and video parameters to true would result in multiple streams or even one stream containing both the audio and video information, but I was wrong and after a detailed search, I ended up with selecting both the streams in the following function:



So, in RecordRTC we have to use the same stream object, but pass the appropriate type of stream (audio, video, gif, canvas) to choose in the library constructor, which calls the appropriate recorder, such as WhammyRecorder to record video, etc.

While stopping the recorder, similar steps are required to get the captured stream and we have to enforce a callback mechanism in the audio recording to get the video stream within the audio recording.


Currently this is supported in chrome and firefox aurora and works correctly on both the desktop as well as mobile devices out of box. But at times, the recorded audio is not in sync with the video (atleast for the first recording after each page refresh). Hopefully, this will get sorted out and this spec gets implemented in all the browsers and saves us from the problem of installing plugins like flash to use the hardware for data capture from user.

Tuesday, May 6, 2014

Packt celebrates International Day Against DRM today, May 6th 2014

Packt publication is celebrating International Day Against DRM today and this is a welcome sign for all the users of this ebook publisher. Personally, I've never been a fan of DRM and have always felt that it only encouraged piracy by harassing the legitimate users.

To mark this occasion, Packt Publication is showing its support for the clause by giving worthwhile discounts as mentioned in their press release:

Packt Publishing firmly believes that you should be able to read and interact with your content when you want, where you want, and how you want – to that end they have been advocates of DRM-free content since their very first eBook was published back in 2004. 
To show their continuing support for Day Against DRM, Packt Publishing is offering all its DRM-free content at $10 for 24 hours only on May 6th eBooks and Videos at www.packtpub.com

“Our top priority at Packt has always been to meet the evolving needs of developers 
in the most practical way possible, while at the same time protecting the hard work of our authors. DRM-free content continues to be instrumental in making that happen, providing the flexibility and freedom that is essential for an efficient and enhanced learning experience. That’s why we’ve been DRM-free from the beginning – we’ll never put limits on the innovation of our users.” 
– Dave Maclean, Managing Director

Advocates of Day Against DRM are invited to spread the word and celebrate on May 6th by exploring the full range of DRM-free content at www.packtpub.com, where all eBooks and Videos will be $10 for 24 hours.


Saturday, April 5, 2014

Book Review: Agile Data Science



Agile Data Science is an instant starter guide for all things related to big data and is aimed at beginners in this technological field. It attempts to explain the technologies associated with bid data in a very simplified manner, which can be helpful to first timers but quite confusing at implementation level - mainly due to the lack of space for the topics dealt with this book.
The first half of the book talks about theory and setup of tools that can easily be setup for a data analysis project and the remaining half of the book goes into its creation.

The main point of contention I felt was the way the book shifted gears- the first section was comprehensible enough but upon moving to the second section, the level of details set in quickly, making the contents hard to follow and I ended up using the available source code.

It first starts with explaining the big data and cloud technologies prevalent today and then introduces the user towards Data, cloud and various handy tools to utilize them.
The next part contains an end to end application that mines and provides analytics for emails - a very real world implementation which is covered in various aspects.

However, as open source projects keep on changing rapidly, the relevance/best practices followed by the example is questionable. This is also important as various startups follow their individual app stacks while addressing the big data challenges.

As a personal note, I would not recommend this to a learner as a person is better off developing in bits and pieces - various tutorials over the internet serve this better. Only if you need an insight into how things are carried out to meet big data challenges using opensource technologies is this text useful.

Note: I've been provided a copy of this book under the OReilly's blogger review program.

Saturday, March 22, 2014

2000th Title Campaign by Packt Publishing


Great news for book readers!
 
Packt Publishing has launched an exciting campaign to coincide with the release of our 2000th title. During this offer Packt is giving its reader a chance to dive into their comprehensive catalog and Buy One, Get One Free across their entire range of eBooks.

The campaign begins on 18th-Mar-2014 and will continue up until 26th-Mar-2014. Following are the benefits readers can avail during this campaign.

  •     Unlimited purchases during the offer period
  •     Offer is automatically applied at checkout


You can start this by visiting this page:  bit.ly/1j26nPN

Tuesday, March 18, 2014

8 new features for Java 8

Jdk 1.8 aka, Java 8 is launched today meaning that the General Availability release of it is out in the open and developers can switch from Early Release releases to a tested release for production use. But what does it means for you, the busy Java developer? Well, here are some points that I condensed to mark this release:

1.Lamda Expressions

I started with lambda expressions as this is probably the most sought after feature in the language after probably Generics/Annotations in Java 5.
Here's the syntax:

(argtype arg...) -> { return some expression.. probably using these arguments }

What it does is that it reduces the code where it is obvious, such as in an anonymous innerclass. (Swing action handlers just got sexy, yay!)
So, a thread can be changed as:
Runnable oldRunner = new Runnable(){
    public void run(){
        System.out.println("I am running");
    }
};
Runnable java8Runner = () ->{
    System.out.println("I am running");
};

Similar to Scala, type inference is also possible in Lambdas. Consider the following available example:
Comparator c = (a, b) -> Integer.compare(a.length(), b.length());

Here, the types of a,b (In this case String, from the Comparator interface) are inferred as the compare method is implemented.

The symbol used to separate the block from arguments, -> is quite similar to => already used in Scala and if you are good at it, there is not much reason to switch as you will feel the way lambdas are implemented in java is inadequate(and verbose), but for a good 'ol java programmer, this is the way to go.

2.Generic Type changes and improvements

Taking clues from Lambdas, generic collections can also infer the data types to be used to an extent. The methods for instance using a generic collection need not specify genric types. Hence, the following method
SomeClass.method();
Can be called simply ignoring the type information:
SomeClass.method();
The type can be inferred by the method signature, which is helpful in nested calls like myCollection.sort().removeUseless().beautify();


3. Stream Collection Types(java.util.stream)

A stream is a iterator that allows a single run over the collection it is called on. Along with Lambdas, this is another noteworthy feature to watch out for. You can use streams to perform functional operations like filer or map/reduce over collections which can be streamed as individual elements using Stream objects. Streams can run sequentially or parallely as desired. The parallel mode makes use of fork/join framework and can leverage power of multiple cores.
Example:
List guys = list.getStream.collect(Collectors.toList())
can also be implemented parallely as
List guys = list.getStream.parallel().collect(Collectors.toList())

Another nice example that reduces the collection to a single item is by calling reduce algorithem.
int sum = numberList.stream().reduce(0, (x, y) -> x+y);
or,
int sum = numberList.stream().reduce(0, Integer::sum);


4. Functional Interfaces (java.util.function)

These interfaces contain some default methods which need not be implemented and can run directly from the interface. This helps with existing code - changing interfaces need not make all the classes implementing it implement new methods. This is similar to Traits in Scala and functional interfaces will be compatible with lambdas.

5. Nashorn - The Node.js on JVM

This is the javascript engine that enables us to run javascript to run on a  jvm. It is similar to the V8 engine provided by chrome over which Node.js runs. It is compatible with Node.js applications while also allowing actual Java libraries to be called by the javascript code running on server. This is exciting to say at the least as it marries scalability and asynchronous nature of Node.js with safe and widespread server side Java middleware directly.


6. Date/Time changes (java.time)

http://download.java.net/jdk8/docs/api/java/time/package-summary.html
The Date/Time API is moved to java.time package and Joda time format is followed. Another goodie is that most classes are Threadsafe and immutable.

7. Type Annotations

Now annotations can be used to decorate generic types itself.
Eg: List<@Nullable String> which is not desired always, but can prove to be useful in certain circumstances. Apart from decorating Generic types, it can also be used in constructors and casting.

  new @NonEmpty @Readonly List(myNonEmptyStringSet)
  new @Interned MyObject()

  myString = (@NonNull String) myObject;
Even the array objects can be annoted:
  @NotNull String[] arr;

The inclusion of RuntimeVisibleTypeAnnotations and RuntimeInvisibleTypeAnnotations attributes which cause the .class file to save the annotation information.

8.Other - (nice to have) Changes

Reflection api is slightly increased with the support of TypeName, GenericString, etc.
String.join() method is a welcome addition as a lot of self created utility classes are created instead. So, the following example
String abc= String.join(" ", "Java", "8");
Will get evaluated as "Java 8".
In the Collections package, the Comparator interface is revamped and methods like reversed, comparing and thenCOmparing have been added which allow easy customization of comparison over multiple fields. Other libraries like the Concurrency and NIO have also been updated but is nothing noteworthy for following up and is keeping with the changes in the api.


Overall, Java8 is well thought of and is making mainstream java concise and picking some good parts of Scala/Clojure for the improving its syntax and addressing much sought features.

Monday, March 17, 2014

Mocking Express.js servers in Node.js applications

Today I came across Shmock, which is Express.js HTTP server mocking library which is handy for various node.js applications.
It is better than other mocking libraries out there as it relies on creation of an actual Express server instead of mocking the http client, which easily leads to brittle tests.
The usage is completely present on the github page for this tool and can easily be used with testing tools like Mocha.

When polyglot persistence saves the day

I hope you find this read interesting and you are in a good mood like me as today was Holi, the great Indian festival of colors. However, at work the same fun was not there as there was a rush job for fixing an error in some codebase, pufff. It was a python project that was calling rets server to fetch information from different properties and looping within. MongoDB temporary tables were created under this as the data was first getting stored in them before saving it in the actual MongoDB data collection, which was causing the entire server disk space to be used. What optimization I did was to use Redis lists in place of this arrangement which reduced the space requirements and removal of data from redis bears lesser side effects than doing that in temporary collection which gets saved to hard disk on every write operation.
However, this means that If I am using polyglot persistence in my project (as this term is known), then the amount of code complexity to deal with the different data stores increases.
My work on the project remains unfinished as I have to nail getting the proper  RETS server data (using their customized querying mechanism) and also find out why and how the existing data was wrongly extracted.

Saturday, February 15, 2014

Optical Character Recognition with Nodejs

Today, I was prototyping a OCR tool to use as a web based API. My first intention was to develop a desktop version in python and provide it via flask application, but using node proved to be a lot easier.
Node.js has a library binding with Tesseract which proved to be quite handy.
I simply installed the library first using npm

npm install nodecr

Next, in a simple node application, I processed a user uploaded image:

ncr.process(filePath, function(error, text){ ....

This callback function performed the task of parsing the image and providing the text.

I have uploaded it into a generic application at https://github.com/SumitBisht/node-ocr and hope you will find it helpful.
Note that this is a really dumb form of OCR and the image sanitation needs to be provided first into it, on which I am working upon.

Sunday, February 9, 2014

Book Review: Thinking With Data

Thinking With Data - 

How to turn Information into Insights











 During the past week, I read this book by Max Shron which addresses the big data challenge from a different perspective - What questions to ask from data to gain the best possible or the most beneficial answer for its owners. The importance of this book is not just confined to people working in some niche segment, but associated with big data in general- from students/researchers to data analysts and developers. if you are expecting some big data technology or implementation, you'll be disappointed. Instead, the book focuses essentially on the problem of deciding What to find from data. For instance, the scope of this problem is further extended into 4 parts - context, needs, vision, outcome.

As I am currently working full time on big data project where similar problems crop up - for instance, we know how to perform a predictive analysis algorithm but the main challenge is to select a specific algorithm and fields to obtain the mathematical result that can infer results which actually solve business problems. As the author is into data strategy consultancy and a former data scientist, the tone of the book is quite practical and uses real world examples to better explain its concepts where needed.

One of areas where I felt the book was weak was in this assumption that the problem/challenge will be a greenfield one and the legacy/existing systems will not influence in the decision/role making process of proposed solutions. The presence of existing big data strategies in place can act as the guideline for the future process. Another thing which was missing were the presence of anti-patterns of the big data formulation strategy - such as what not to do while tinkering with the data and algorithms to extract intelligence from data.

In spite of being shorter than other books of the same topic, this book does a overall good job in discussing the problem of what to extract for big data analysis and is definitely a must read and reference for anyone dealing with the same and avoiding showing unnecessary noise instead of meaningful data.

Disclaimer: This book has been provided me by OReilly under their Blogger Review program.

Sunday, January 26, 2014

Book Review: Ruby Under A Microscope

 

 

 Ruby Under A Microscope

 

Author: Pat Shaughnessy


During the past few weeks I've been reading this book, 'Ruby Under A Microscope' by Pat Shaughnessy which covers internals of virtual machine used by different implementations of Ruby programming language. I would consider myself lucky to have an opportunity to review this book under the Oreilly's blogger review program as this is a must-read for any Ruby developer and specifically to someone doing a Ruby vs Java comparison. While the virtual machine internals and algorithms used are widely discussed and known in Java, the same cannot be said for Ruby and Ruby developers generally shrug it off when confronted with interpreter/platform specific issues and seek refuge with changing/scaling up resources.
Detailing of internals help in understanding the behavior of the language as the optimizations made internally by the compiler have a direct bearing on the behavior of the program and can exhibit unexpected behavior. What is of peculiar interest here is that this book delves into the internals down to the compiler in order to understand the resultant behavior/performance.
The book starts with the tokenizing and parser mechanism preset in ruby and continues to the compilation of interpreted script into YARV instructions. This is then explained in next chapter where the program call stack and variables used internally is detailed.

Control branching and method dispatch are discussed before the discussion moves into Class, Object and Method mechanism. The hash mechanism used to save objects by ruby is detailed next before blocks are discussed and Lambda and Procs are discussed together with Stack vs Heap memory. This is followed with advice which is worth reading for those not familiar with metaprogramming ruby and while this is not as verbose as 'Metaprogramming Ruby', it does a good job explaining various reflection based constructs.
After covering the language internals, two of the leading platforms, JRuby and Rubinius are detailed and compared against MRI(YARV). Lastly Garbage collection is discussed and comparison between ruby and java garbage collectors is discussed. Pat really does well in explaining the differences and also provides ideas to explore things based on the GC profile report here. I'd also advise to check out his blog for more such topics.

I'd heartily recommend this book to anyone who is interested - from students to expert pros alike as anyone coding/using Ruby ought to know its internals and have more confidence in developing and using this platform for non-trivial tasks.
Disclaimer: This book has been provided me by OReilly under their Blogger Review program.

Wednesday, January 22, 2014

User management in Node through Lockit.js

Lockit.js is a user management tool in express.js that performs the signup/login of users and create corresponding user entries in the database using standard best practices. To explain in a nutshell, it is quite similar to Devise, a user authentication gem used in Rails based projects. As Lockit is available as an express module, it can easily be used in any project by npm install:

npm install lockit

Although this project is currently in its infancy, it offers an uncluttered approach towards user management at cost of being opinionated, which is okay as long as you need to save and manage standard user actions.

As the project github and website page explains, you can easily setup the application in any node based application that uses either couchdb or mongodb and in a mongodb based application, you only require a config.js file that takes care of configuration and specify it inside the main application file as:

  lockit(app, config);

Only the lockit and config are needed to be required in the main module of the express application.

Inside this config file, you can specify parameters like:

  • Database(URL) [Mandatory]
  • Application Details
  • Confirmation email sender's mail configuration
  • Login attempt, account lock and verification link settings
  • Routes to various actions
  • Verification and confirmation mail templates


Depending upon your requirements, you would require changing its internals which can easily be done in the dependent libraries from the Lockit. While listing from npm, it looks like the following diagram which makes the description of Lockit more clearer.


As it is opinionated, its views require twitter bootstrap css to display and apart from mongodb and couchdb, other databases are not supported at the time of this writing.
Update: The views are customizable - the config file provides various templates; depending upon your requirements, you can specify them manually:


It is worth mentioning that you can use the signup or password reset token and append it to signup or reset password routes, depending upon object state to keep exploring the library without acutal authentication mail sending facility.
While exploring this library, I've created a sample node application hosted at github that tries to cover this library while using minimum code.