Contact
When did it start?:

Necessity is the Mother of Inventions

Way back in 1998-99, my crystallization experiments were not working, either 'No Crystals' or 'No Diffraction' or 'Unable to Process' or 'Bad Resolution' or blah.. blah... blah.. Every Step there was uncertainty and I finalized that My Hands are NOT Green. I started looking at analysis of the then available crystallographic structures.

BTW, I was working for my doctoral thesis on Structure and Function of Aspartic Proteinases. I had ~300 Crystal Structures available in the PDB by then (which belong to aspartic proteinase family).

I started asking some basic questions like

  1. How similar are the structures at the sequence level, at the structure level?
  2. How pH influence the structural Changes which inturn regulate the Function/activity of these proteinases?
  3. Are there any conserved and invariant water molecules?
  4. What is the possible role of these water molecules?
  5. It is known that ''Aspartic Proteinases' are active in Low pH, any specific role played by Waters, in the context of activity?

To answer these questions: I followed the common/standard protocol of any researcher:

I scanned the net and checked with my group/lab members & friends, joined couple of mailing lists/forums etc for:

  1. Pair-wise structure Alignment programs if any
  2. Pair-wise Sequence Alignment programs if any
  3. Multiple Structure Alignment program and Multiple Sequence Alignment program if any
  4. Dendrogram program if any
  5. program to calculate Buried and Accessible surface area if any
  6. Hydrogen Bond Patterns, specially with Water Molecules if any
  7. Making Pictures if any

After a lot of effort, I could bring all the tools together to one place.
After a lot of source code tweeking/hacking, I could make them work.
These tools were written for a task and one data set for example: Only 2 PDB files in a wierd format. for that format I had to write some more code. How can I run 300 X 300 alignments and when will it get over? These thoughts always pestered me.
I could not locate a good parser for reading the PDB format then. Those which I found were not written to access every information which I was intending to access and use for my analysis for example: I needed a PDB parser which can :

  1. get me Number of Water Molecules in the PDB
  2. get me Number of Residues (from SEQRES and ATOM Records)
  3. get me Number of Chains in the PDB
  4. get me a Single Desired Chain from the PDB based on the Chain ID
  5. get me the X, Y, Z Coordinates of an Atom, of a residue of a Chain, of a Protein, of a Subunit, of a complex.
  6. get me a Portion of the Chain i.e a Chain Segment given the starting and ending residue number
  7. get me the Coordinates of "Other than Hydrogen Atoms" from the PDB file
  8. get me the Rotation Matrix and symmetry Operators and space group to generate the Symmetry molecules
  9. get me the distance given an arbitrary x,y,z coordinates, or point, or atom or residue centroid etc.
  10. get me the BFactor information to plot and understand the stability of the protein atoms

.... and many such requirements.

Similarly I had to write my own glue-code to run and provide a continuity to my steps of analysis which we can call as our Bio-Flowchart

This was real painful job for me. I think this is a painful job for any researcher even today. The reason being, my senior colleague spent almost 3.5 years on similar bio-flowchart. I spent a little less than 2 years for similar bio-flowchart.

Of all this: I considered 90% of time to be 'unproductive' as I felt that as a biology researcher, I could have spent my time on analysis of the results rather than writing 'non-biological and/or irrelevant' glue-codes. I started looking to give a more permanent solution to these type of problems. The encouragement was that: Biological Data will never reduce in the coming years: A fact.

I felt this frustration could be a common problem to many more researchers out there doing respective analyses with biological data. Today, I feel that the 16 - 20 years NCE's time-to-market is due to lack of such biologically intelligent, flexible and reusable biological programming environments.

I had just completed my Object Oriented Programming Using C++ course - A Part Time Course I learnt and a BrainBench Certificate Exam on C. I realized "ReUsability, Flexibility and Maintenance" are Key to any Software and OOPs is the only paradigm to "reduce the unproductive time I spent".

So, I decided to take the initiative of writing a complete parser in my own capacity in a reusable way i.e when the first Biological Abstract Datatype (BioADT) was defined and implemented as BioPdb. Then followed, BioGenBank, BioFasta, BioSwissProt, BioEmbl and many other parsers

After some further thought on design, I realized and initiated 'BioBhasha®'-the 'Language Paradigm' in Biological Research than the conventional 'tool Paradigm' which I always felt very restrictive . This helped me reach - A one-stop, one-language bio-software solution to every biological query.

After completing my PhD, with friends, we started this company, filed patents on the 'Language paradigm'. I desire and intend to spend more of my time and energy in making 'BOS®' and 'BioBhasha®' a well-sought-after biological software.

As with any Language, BioBhasha® has been evolving imbibing more and more algorithms, Parsers, Utilities and many more features. Now, The LOC++ of BioBhasha® has reached ~300K and still counting +ve.
Yes, today if you ask me, I cannot work with out BOS® or BioBhasha® . The analysis which took ~2 years then, I can complete that in 15 days. Thats the time and cost you too can save, I can guarentee that.

Soon I will add some projects to describe the enormity of work taken and the duration in which they were completed.