Career Profile
I am a technology leader with deep expertise in Big Data and in creating scalable software products working in both large companies and small. I have been always involved in building innovative products whether it was at big company like Yahoo!, mid size companies like StumbleUpon & Badgeville or early stage startups like Promote and my current startup InfiniChains. I strongly believe Software and Data will be the key driver for innovation in all companies and sectors and I am very excited to be playing my role in taking the industry forward.
Experiences
- Lead a team of very talented people to create traceability product for the following verticals: Pharma, Textile, Mining & Recycled Plastics
- Helped secure funding of $2.3m at the onset
- Got big textile manufacturers like Welspun and PSL as traceability customers
- Created an organic farm data management and certificate filing system. Onboarded 500k certified organic farmers in India.
- Got InfiniChains accepted in the reputed Fashion for Good accelerator program.
Realtime ETL & Reporting system:
- Designed and implemented a realtime and high volume event collection platform in Scala that transforms and persists the data into Google Bigquery.
- Built a high scalable reporting system in Scala on top of Google Bigquery that can support flexible schema.
Ad Server Platform:
- In charge of running the Ad System that included managing the Ad server, reporting and entire ad campaign management interfaces used by the advertisers
- Single handedly designed & implemented the Ad Server in Scala. For a specific requirement also implemented it in Elixir (Erlang VM)
- Designed and implemented a highly scalable reporting system for the Ad system that was built using Clickhouse and Scala for the API layer
Recommendation Engine for SUN (StumbleUpon Network) :
- Designed and implemented the recommendation service in Scala & Twitter Finagle
- Achieved extremely low latencies (under 3ms for P95)
- Came up with a nifty way to store what users have already seen. (Details in my blog article http://parthpatil.com/2015/06/18/optimizing-dedup-cache-using-bloom-filters/)
Reporting API Service:
- Rewrote the aging PHP based reporting API for SU Ads using Scala & Spray
- The reporting data is stored in HBase
Payment gateway migration from PayPal to Stripe :
- Moved the whole payment infrastructure from PayPal to Stripe
- Achieved zero downtime migration in a short time.
Analytics Platform :
- Currently working on building an advanced analytics platform using Hadoop 2.2, Impala (with Parquet) and HBase (with Phoenix)
- The platform allows our customers to provide advanced queries through our user friendly query language. These queries are run periodically and reports and graphs are generated for the customer to view.
Smart Message sender :
- Scala and Akka actors which can scale to several thousand msg/sec.
- It's a distributed and fault tolerant system that consumes messages from Kafka queue and delivers the messages to remote endpoints with a flexible and configurable retry policy.
- Uses Redis queue with server side Lua scripting to implement retry policy for messages that could not be delivered due to customer end point being down or slow.
Ad Server :
- Worked in a team of 3 developers to design and implement an ad server platform from scratch written in scala and built using finagle and scala actors.
- I was responsible for implementing the data service part of the ad server whose job it is to fetch the data needed by ad server to serve ads.
- The data service does its job by employing several actors that work in parallel.
- Data is fetched from various sources (kafka, log files, Mysql, HBase), transformed and then loaded into Redis where it can be used by the ad server. Advanced
Advanced Analytics Platform for AdSystem :
- Designed and implemented an advanced analytics solution that is used by ad system customers.
- The reporting system allows for flexible grouping and filtering of massive amounts of data in real time.
- The persistent store for the reporting system is HBase.
- The report data is generated via a Map-Reduce job written in Pig.
- The report data is queried through a PHP library that talks to HBase over Thrift.
Real time Notification Platform :
- Lead a group of 6 engineers to create a functional prototype of a real time notification platform in 5 days.
- The platform allows surfacing of notifications to our users via messages in toolbar in response to certain events.
- The events generated in different parts of the system were fed into a Kafka queue.
- Created scalable event consumption and notification generation system using Gearman.
- Used Redis to store notification queues for each user.
- Implemented smart polling with exponential backoff for checking for new notifications in the toolbar (client side).
Inventory Estimation Tool :s
- Designed the core engine of the inventory estimation tool that is used to estimate the traffic for a given set of targeting criteria ( user attributes like age, gender, location). The system was implemented by other members of the team.
- The engine uses Java Bit Vectors to represent users who visited the site in a given time range and combines vectors in real time to compute the traffic for the selected set of targeting criteria.
Single SignOn For su.pr :
- I designed and implemented a Single SignOn(SSO) facility for http://su.pr sister site.
- Its implemented in OO PHP and uses short lived cookie based token for facilitating SSO across different domains.
- SSO made it possible for publishers to use the same login that they have on stumbleupon.com on su.pr and reduced the barrier to entry.
- SSO is also used by the support team to log into publishers accounts to help them with their problems. The support team can do this without having to know the publisher's account password.
StumbleUpon Badge & Partner API:
- Originally conceived the idea of StumbleUpon badge, built a prototype and convinced the product and business development people that we had an attractive product to sell.
- Single handedly designed the whole system(except CSS) from end to end to support the StumbleUpon badge.
- Wrote the JS for the badge and made the badge IFrame based so that it offers very high isolation from the parent page on which it is included.
- Built the security infrastructure to prevent gaming & data scraping. Implemented rate limiting as a measure against DOS attacks.
- Created a highly scalable backend on top of Memcache that acts like a perfect cache i.e. the cache does not depend on timeout to invalidate the data but the data gets invalidated only when the data for the badge actually changes.
- Built a REST API for partners to query our system for information about their urls.
- Worked closely with Partners (HowStuffWorks.com, HuffingtonPost.com, NationalGeorgraphic.com) to get the badge tested and deployed on their site. Provided full pre deployment and post deployment support to partners for launching StumbleUpon badge on their sites.
Analytics & Tracking:
- Designed and implemented custom page instrumentation and user tracking infrastructure.
- Implemented cookie based user entry point tracking and goal state (signup) tracking. Created reports to analyze the conversion quality of various user acquisition channels (referrers).
- Created reports to analyze the probability of a user creating an account in response to getting a join invitation from a friend Vs an interesting url shared by a friend.
- Wrote tools in Perl, PHP and used sqlite to process & summarize huge log files.
- Wrote an object oriented reporting framework that has modular and highly reusable widgets. It has widgets like table, date picker , pie charts that can be used in an object oriented way to build reprots.
- Convinced management to switch to Omniture Analytics Suite from Google Analytics.
- Single handedly transitioned all analytics and tracking from Google to Omniture.
- Worked with Product people and instrumented critical flows (e.g. signup flow) in the system and helped do the funnel and drop off analysis.
- Did A/B testing to see how it affected the funnel and drop off reports. Used the learning from these experiments to optimize the signup flow for maximum conversions.
Metrics Suite:
- Designed and built a metrics and reporting suite that helped us identify and understand various trends in the system like variation in user activity around weekend and holidays, spam activity, a piece of content going viral etc.
- Created a report to measure the effectiveness of our recommendation algorithms over time. The results of this report are being used to fine tune the recommendation engine.
- Due to massive size of datasets involved in these reports I investigated various high performance computing (HPC) & reporting infrastructures like Hadoop (open source framework to run map reduce jobs) & open source business intelligence and OLAP offerings. Convinced the backend infrastructure team and the recommendation engine team to setup and use Hadoop for high performance data analysis.
Set up advanced LAMP dev environment:
- Soon after I started at StumbleUpon I created a highly efficient dev environment.
- Setup PDT for PHP development.
- Configured xdebug on the shared dev server to be able to do remote debugging from personal dev workstations through the PDT Eclipse IDE. Trained people on using xdebug for remote debugging, function tracking and profiling.
- Wrote a custom file system event listener to automate the syncing of files from personal dev workstations to shared dev environment that is triggered by the file save event. Wrote this tool for both Mac OS X and Linux.
- Wrote extensive documentation on setting up and optimizing the dev environment.
Set up test infrastructure:
- Helped setup test infrastructure for various kinds of testing.
- Setup PHPUnit for unit testing.
- Wrote test suites in Perl using the Mechanize and web test packages for functional testing and these tests are also used for site monitoring.
- Demonstrated the use of Selenium and CubicTest for functional and acceptance testing.
- Used apache bench for load testing.
- Used LiveHTTP headers, tamper data and firebug plugins for firefox to debug HTTP and AJAX related issues.
New employee on boarding:
- Played a mentor’s role for new employees.
- Created wiki pages (cookbooks, FAQs etc) for getting new engineers up to speed with our environment and systems.
Email bounce processor:
- Wrote a script to process the POSTFIX mail server logs and extract emails that bounced along with the bounce message.
- Extended PHPMailer so that it checks an email against the bounced email table before sending a mail.
Performance, Scalability and Monitoring:
- Worked in a team of 3 developers to design and implement an ad server platform from scratch written in scala and built using finagle and scala actors.
- I was responsible for implementing the data service part of the ad server whose job it is to fetch the data needed by ad server to serve ads.
- The data service does its job by employing several actors that work in parallel.
- Data is fetched from various sources (kafka, log files, Mysql, HBase), transformed and then loaded into Redis where it can be used by the ad server. Advanced
Helped adopt SCRUM process:
- Played a vital role in setting up the SCRUM process for developers.
- Tailored the process so it suites the needs and environment at Stumbleupon.
- Evangelized the benefits of the process and convinced people of its effectives.
- Setup plugins in trac to make it easy to adopt and use the SCRUM process.
Ad System Tools:
- Designed and implemented the Auto campaign approval system. The system allows the business folks to white list trusted advertisers for auto approval so that when the trusted advertisers submit campaigns they directly start running and don’t get queued in the approval queue.
- Wrote an advanced campaign approval interface using jQuery that dramatically improved the campaign review efficiency of people who review submitted ad campaigns.
- Wrote tools to automate the reconciling of transactions between paypal and our system. The tool is written in Perl Expect, the tool logs on the paypal’s FTP report server and fetches transaction reports for previous day for reconciling.
StumbleUpon website redesign:
- Co-created an MVC framework that made it easy to rewrite the core pieces of the StumbleUpon website.
- Rewrote critical parts of the site using the new framework.
- Handled complete release & configuration management for the launch of rewritten website.
- Created and maintained the staging environment.
- Created a session based traffic segmentation scheme that enabled us to test the new website on a small percentage of users before we released it to everyone.
Local News:
- Completely rewrote the local news section of Yahoo! News using an in-house PHP MVC framework.
- Designed and implemented the front end system.
- Implemented user customization for source ordering.
- Rewrote many modules in AJAX.
- Implemented efficient caching scheme to minimize the load on the news search backend and made it easy to tune the caching duration.
- Performed stress testing for the system.
Yahoo! Weather Migration:
- Migrated the weather backend from and older version of ULM (Universal Location Manager) to ULM 3.0. ULM is Yahoo!’s infrastructure to centrally manage user’s location preferences across all Yahoo! properties.
- Created an index of all weather locations using Internet locality data pack so as to create a mapping for every location in the Where On Earth (WOE) database to a location in the weather database.
- Integrated this index with ULM system and the front end to get weather for location that is closest to the user
Yahoo! News Week in Photos:
- Created a backend tool for producers to replace the manual procedure of creating the Week in Photos feature of Yahoo! News.
- The web based tool helps them to create the RSS of photos which is then consumed by a flash module to display a slideshow of the photos.
Maple parallel fetcher module:
- Developed a core module for the Maple web application framework which is being used widely through out Yahoo! media properties to build websites.
- Worked with the Maple team for 2 months and wrote Parallel fetcher module in PHP 5.
- The module helps to easily create a maple component to fetch data from several web services in parallel.
Domain Name Ranking (DNR) Tool:
- Developed the DNR Tool in Perl.
- Implemented a Domain Ranking Algorithm for prioritizing the domain registration process.
- Implemented a Similarity Ranking Algorithm(n-gram) to filter out domains having trademark names.
- Implemented a Fuzzy Logic Engine to capture the expert knowledge in choosing good domains.
Keyword Clustering Tool:
- Designed and developed the clustering tool in Perl.
- Implemented a Spell Check and Offensive-Keyword filtering algorithm using Standard English dictionary and Custom dictionaries.
- Implemented a Clustering Engine to filter keywords based on Syntactic Rules specified as Regular Expressions by the user.
Performance Monitoring and Alert System:
- Designed and Implemented a System in Perl to monitor performance of various advertisement that the company runs.
- The system is based on a statistical model and inference rules that decide whether a fluctuation in stats qualifies as an alert and sends alerts to concerned people.
Reporting System:
- Implemented a daily performance reporting system for our customers using Perl & Mason.
- The application has a front end in Mason and the backend is Perl Module which is initiated by a cron job.
Performance Forecasting System:
- Built a Forecasting system in Perl that predicts/forecasts performance parameters for our advertisers based on money they are ready to spend.
- The system forecasts numbers by analyzing the advertisement queues to the ad-server and other influencing data.