Saturday, August 24, 2013

Recommendation - Its all about the possibilities

Recommendation engine plays a crucial role in consumer websites/apps. Some applications like pandora are driven entirely based on the recommendation.





If we  take an example of consumer website like eCommerce, news website., there are various factors that can be taken into account for recommendation engine development.

I'll list down these possibilities for a better understanding and share some tips on implementation...

Collaborative Filtering and ML

Recommendation Engine can leverage Machine Learning frameworks like Mahout  for the below types of recommendations.

User Based - Recommendation engine can give recommendation by mapping the similar user and their preference.

Item Similarity - Similarity between items are calculated and recommended

User based and Item Similarity recommendations comes under Collaborative Filtering technique.

Content Based - We can compare the attributes between items and users to provide similarity. It is especially useful when we don't have sufficient data to perform collaborative filtering. We can use some similarity algorithm from ML library like Mahout or we can use search system like SOLR for this.

Clustering - Clustering techniques can be used to group the similar items

Other Possibilities:

Below I'm listing other factors that can be considered for recommendation. I believe these things are more intuitive and we can identify them when we see from an audience view point.

Out of user boundary / Opposing Views -   We can recommend something fresh out of an user boundary.  Something user never explored before. It can also be a content from others with opposing views.

History - Recommend an item by tracking user's viewing history. Views can also be tracked by a heat map or attention to particular area of web page by the user.

Top selling  Items - Recommending Top selling items, Top 10 items .etc

Trending Items - Trending/Most shared item in social media like facebook, youtube, twitter

New Releases -  New release may trigger user interest. This is highly relevant in fashion world.

Friend's actions - Recommend an item based on the friends actions by leveraging social media. But this is not always relevant. I may not want to buy the same shirt my friend wear but I like to read the news he commented.   In the section  - "On Implementation" - Uniqueness and Weightage talks about this

Location based - Some purchase decisions may be location based. When a user located in Mumbai and if it is a monsoon, we can recommend rain accessories applicable to that location.

Map user personality - We can map user personality  and recommend an item. We can pull information  from social media sites like likes, favorites etc and map the user personality. We can also map a personality by mapping user's past actions with the product. A hybrid of both approach also works.

Personalities can be like this - One who try new things, One who spend on premium/expensive stuff,  One who always purchase on deals .etc

Time based - Decision making may be different from starting of the week, end of the week, month starting. etc.

Screens - Recommendation may be different on various screens like laptop, mobile, tablet. For example, if your customer using your mobile app and you able to access his location, you can recommend an nearest store deals, distance to nearest store .etc

Still there  may be many  things to factor, more techniques &algorithms that can be useful to give a better recommendation to the user.

On Implementation


How to factor all these possibilities on the implementation?. I'm sharing some tips on the implementation front...

Unique Approach -  Unique approach for each usecase. If we take this recommendation example, recommending an article in news website is different from recommending a shirt in an online apparel store.

Hybrid Models-  Combining the various approaches/possibilities may work. Here we can combine multiple ways of recommendation.  It can be a combination of user based  +location based +something else.
Its not always Yes or No, try to find alternative paths.

Weightage - Weigh the features, have a strategy and incrementally build and release the features. For instance, you can build the basic recommendation engine using item based or content based and incrementally add the features based upon user testing and feedback.

Right Metrics - Have a right metrics.  For the above example, the metrics would be how a consumer reacts to the recommendation, what is the change of traffic pattern .etc.

Place for Experimentation - Have a place for experimentation. Check and measure with labeled or test data. You you can also experiment with A/B testing to test which approach actually works better.


Wednesday, August 21, 2013

Going Lean will save your time in Software Development Lifecycle

Eliminate wastage is a Lean software principle. In the software product development life-cycle, we can avoid the wastage of time and resources in many ways and in-turn we can invest our time productively in building a product and give a better customer experience. This post covers some basic things we can implement in project to save our time and  also improve the overall quality of the product.




No Grepping


Having a better log analysis and monitoring mechanism saves overall project time. Don't waste time in grepping or FTP ing the log files. You can have a look on Splunk Log Management (http://www.splunk.com/view/log-management/SP-CAAAC6F/) to get an understanding of log collection, analysis and monitoring. You can also build your own implementation.

Some technical pointers if you want to build your own implementation..

Log everything in same time stamp, preferably UTC.
Have a common  log data  format everywhere from server logs, audit logs, application logs. etc.  Explore JSON logging
Log Processing - Check Logstash (http://logstash.net/).  Hadoop can be used in specific use cases.            
Searchable Log Repository - Elasticsearch. can be a good choice
For Analytics and Insights - You can use Pig/Hive, Hbase, RedShift, Impala, Google BigQuery  or any other relevant tool
Have an alert Mechanism for important events specifically on error scenario, suspicious events.

Searchable Product Repository /WIKI & moderate it


Have a product repository with WIKI along with a good search feature (Example: https://www.atlassian.com/software/confluence/features). Don't maintain the project documents in the file system or in version control systems like svn. Lot of time would be wasted in searching  and maintaining it.

What can go into this repository?.

- Project functional/ requirement documents
- Technical/architectural documents
- Specific problem encountered during development and ways to resolve .
- Reason for technical /functional design decisions
- Important project/business insights
- Research documents
- Problem encountered at system level( Examples: Memory /Space Issues in the environment) and the resolution
- Environment related like server information, background processes .etc
- Code Workflow

and any other relevant documents

Have this repository without moderation is of no use. Avoid unwanted articles dumped into it and maintain the overall quality.

Don't be a human keyboard


There are tasks like this in every project

- FTP ing the files
- Clear temporary files to retrieve disk space
- Monitor the CPU/Memory
- Apply software patches
- Install software
- Replicate the environment
- Look something specific in logs and alert

 Build some handy tools to accomplish these types of task. Automate the process if necessary.

Continuous Integration, Deployments - with automated tests


Automate the whole deployment process and have a continuous integration. Use tools like bamboo (https://www.atlassian.com/software/bamboo)  for continuous integration and to create build plans. No manual intervention should be there in any build and deployment process. Also have your tests integrated with the build process.

Test Suites and Test Plans - Do wonders


  • Have a clear strategy when creating TestSuites, Test Plans and Unit Tests.
  • Create specific test plans for various tests like Smoke Test, Sanity Test .etc.
  • Schedule to run the test plans automatically. It helps to see the problems pro-actively and also saves considerable time in a longer run.

You can use the bamboo tool mentioned earlier to maintain the test plans and scheduling.

Code Review - A must to have & better with a tool


Code Review should be part of the development. Have a proper tool to collaboratively review and add comments. (Example: https://www.atlassian.com/software/fisheye/overview). It helps to saves considerable time in longer run and also increases the overall quality of the product.

Think beyond opensource tools


Invest in tools like Jira, wiki. etc (or)  which ever tool you can afford. Just compare the licencing cost (mostly it is pay as you go)  with overall productivity you are going to get. Also compare the cost per hour you spend for a developer with the hours eventually the tool will save for you.