![]()
THE UNIVERSITY OF YORK
![]()
The design of a Java bioWidget primarily intended for
the display of genetic comparative mapping data.
Jeremy S. Dickson
Submitted in accordance with the requirements for the award of the M.Sc. degree in Biological Computation by The University of York.
John Innes Centre, Norwich Research Park, Colney NR2 7UH, UK.
October 1997
![]()
ABSTRACT
![]()
A program was written for the display of comparative mapping data. It was written to the specification of a bioWidget, which is the concept of a recently formed group called The bioWidget Consortium. The main aim of this group is to produce a library of reusable software components, (bioWidgets), suitable for displaying genetic data. bioWidgets should be written very generically so that practically any sort of data, (genetic or not) can be input into them. The ultimate aim is to be able to easily produce complex customised bioinformatics applications by integrating these bioWidgets via a drag and drop API.
The bioWidget which was written is capable of displaying data as a grid and is similar to the Oxford Grid suite of programs. The placement of the divisions along each axis to make a grid are specified by axis objects, (e.g. the chromosomes of a different species along each axis). Particular attributes that the objects along one axis have in common with the other axis, (homology objects), are highlighted within the cells of the grid, (e.g. homologous loci). The grid cells can be shown as equally sized along each axis or scaled in relation to the size of the objects along each axis.
The bioWidget has a lot of functions available to the user: the axes can be swapped and the grid lines can be turned on and off. The homology objects can be displayed in four different ways: randomly plotted within the cells, scaled according to their true positions along each axis object, in a binarised fashion in which a cell is coloured if there is one or more objects within or a number plot in which a number relating to the number of homology objects in each cell is placed at the centre of the cell. Furthermore, sequences, (e.g. nucleotide), can also be input for graphical comparison.
In addition, the user can zoom in and out of different areas of the grid, show the axis object names along the axes (or just numbers), specify different graphical attributes of the grid (e.g. the background), print the grid and help files are available. Indexes of the two sets of axis objects (showing their names and sizes) are given on the menu bar. On double-clicking on a cell, a dialogue box is produced giving the user some more information about the contents of the cell.
The bioWidget has been written so that any programmer wanting to incorporate it in their own bioWidget application can customise any input events very easily. The database layer is also easily replaceable.
![]()
SECTION ONE
Introduction
![]()
1.1 Comparative mapping and the need for interactive graphical displays.
Over the past few years the amount of genetic data which has been produced as a result of the different genome projects has increased in size exponentially. Some of these projects involve the study of model organisms which may be used to infer knowledge about related organisms via comparative genome mapping. Mapping-data from map-rich organisms like human and Drosophila melanogastor are compared with data from map-poor species like pig. By identifying loci which are located in both, it is sometimes possible to infer locus order in one species from the other. The evolutionary history of different organisms may be examined by studying the rearrangements of loci. It is possible to classify organisms into different taxa by relating the similarities alone. Comparative genome mapping is important in the study of disease. Mouse models (and others such as sheep for cystic fibrosis) of human genetic disorders are created. This is very useful for aiding in the cloning of human genetic disease loci and for deduction of effective treatment strategies.
It has become apparent that in the interpretation of comparative mapping data, diagrams are very important. The phrase a picture speaks a thousand words is very appropriate- a diagram illustrating the homologous genes of the human and mouse genomes and their chromosomal positions is much more striking than simple alphanumeric lists of the same data. The human brain in such a case can absorb much more information from illustrations than from text.
Until recently comparative mapping diagrams found in journals were all created by hand using graphical packages. Due to the complexity and volume of the data being represented this has become time-consuming and will become more so in the future, as the amount of genetic information increases even further. For example, one particular comparative mapping diagram of cereal data, (see ref. 7) shows a circular map of six different cereals. The genome of each species is represented by its homologous relationships to a number of rice linkage segments. These segments correspond to blocks of loci in rice, for which the order is conserved across the other cereal species. Each species is then represented on the map as a concentric circle with rice being at the centre of the circle. In such a map, if a line is drawn from the centre, so that it bisects all of the circles, the loci that lie on the line are homologues of one another. In the past this diagram has been produced each time by hand but now the amount of information is so large that it takes several days to draw.
This is just one example where it seems sensible to automate the process so that these displays are as accurate and as accessible as possible to the scientific community. It would not only have advantages for the production of stand-alone paper diagrams but would be a great benefit to have computerised genetic displays connected to biological databases so that they could dynamically process the data on the fly and present it in a number of different ways on the screen.
1.2 ACEDB and the Oxford Grid.
A major contribution to the organisation of genetic data and its presentation and interpretation came in 1991 with the first release of ACEDB. This is a pseudo-object oriented database management system which was written by Richard Durbin and Jean Thierry-Mieg (ref. 4). The first ACEDB release enabled the management of Caenorhabditis elegans genomic data but since then it has been adapted to make the creation of any customised biological (or indeed, non-biological) database very simple. As a result of this, its ease of use and its intuitive feel (and also the software being free), ACEDB has become one of the most popular biological database management systems in the world. One of the features which makes it so attractive to researchers is its graphical user interface, (GUI). It employs a systems of windows to display the data held in the database via diagrams and text-boxes. One can cross-reference and explore a whole database by just using a series of mouse clicks.
One particular ACEDB database that has received quite a lot of attention recently from the comparative mapping research community is the comparative mapping database of human and mouse genes originally located at the Mammalian Genetics Unit at Harwell in the UK. In order to maximise the usefulness of the comparative data, a suite of tools has been developed to interactively display the data as a number of different grids. The best known of these is the Oxford Grid, (ref. 3). This displays chromosomal data on a grid, with one species chromosomes represented on one axis and another species chromosomes represented on the other, (e.g. human versus mouse chromosomes). A matrix of cells is formed by drawing lines in between the different chromosomes along each axis so that there is a cell for every combination of chromosomes along each axis. Draw within each cell are little symbols representing homologous loci that the two chromosomes of a particular cell share. The Oxford Grid is very useful for examining particular subsets of data, e.g. displaying a map of the different loci involved in eye diseases. It is useful for giving a general impression of the relatedness of two species. For example if a human/mouse and a human/chimpanzee Oxford Grid are compared, it is immediately apparent that the human is much more closely related to the chimpanzee than it is to the mouse, as there is a linear correlation in the scatter of dots. An example of an Oxford Grid is shown in Figure 1.1.
Figure 1.1
An Oxford Grid showing all the human chromosomes on the horizontal axis and all the murine chromosomes plotted on the vertical axis with homologous loci plotted within the grid.
A number of tools exist which examine the Oxford Grid in more detail and follow the same grid theme. The Pairwise Chromosome Map shows a single cell of the Oxford Grid, (a 1 x 1 grid) which is expanded to fill the whole screen. This is useful for identifying potentially conserved segments, which appear as diagonal lines of homologous loci along cells. Inversions also appear as diagonals but are oriented in the opposite direction. The One-to-Many Chromosome Map (a 1 x n grid) is used to compare a single chromosome along one axis with all the chromosomes of another species along the second axis. The Species Grid is used to compare the loci of one chromosome of a species with all other homologous loci in all other species which have been mapped, (again a 1 x n grid).
A single grid map program which would be capable of generating all of these grids, plus a few more was proposed but never written. The advantage of a more generic system would be that there would be fewer limitations on the kind of grids shown, just as long as the data is available. The Oxford Grid series of programs was originally written in C for a UNIX platform. Recently Windows and Apple Macintosh versions have been developed. The recent popularisation of Java, with its particular forte for data visualisation, has led to the resurgence of the Grid Map idea in such a way that it could also transcend system dependant language limitations and be immediately executable on any machine.
1.3 The need for software componentry and bioWidgets.
Over the past few years, the progress in the development of software for the display of genetic data has been impaired by the fact that researchers need very specialised software to perform the exact functions that they require. Two solutions which have been used to overcome this problem are:
1. Use existing available software and make do with its limitations.
2. Develop new software from scratch.
The former solution is an easy and cheap option but is not ideal if the software does not closely suit the researchers needs. The latter solution provides a more practical answer to this dilemma but a lot of time, resources and effort are used in developing new software products de nova.
It would seem that a better answer to this problem would lie in having available a library of software components, each performing a certain function that could be easily connected together like building blocks to form a user-customised software package. Such an idea is by no means new. The highly popular programming language Visual Basic works by dragging and dropping software components, (or widgets) around the window of an easy to use integrated development environment. For example, a developer might add a button to a calculator program by just dragging it off a palette of different widgets and then simply programming the associated variables, e.g. the text label on the button or its response when pressed. If the widgets exist for all of the components of the software package that a developer wishes to create, the program can be created with minimal input of programming code.
The advantage to this form of programming is that it builds on work that others have already done, i.e. not reinventing the wheel each time one is needed. As long as the widgets that are being used are robust and useful enough to be applied to many different kinds of situations there is no reason not to use this form of programming. This idea of incremental programming has recently become quite popular with the biological community due the sudden expansion of the field of bioinformatics and the repetition of code that the associated programmers are experiencing. The concept of these reusable widgets has been extrapolated to form the bioinformatics version of widgets called bioWidgets and recently a group called the bioWidget Consortium has formed. The bioWidget Consortium has a number of aims. These are:
The sudden interest in bioWidgets has been spurred by two main factors: The general shift in software architecture trends towards software componentary and the success of a recently developed programming language called Java. When first released, Sun Microsystems, (creators of Java) vouched that it had the potential to become the dominant network language of the next few years. Judging by its mushrooming popularity, this is very possible. One of the main features of the language that leads to its popularity is its portability between machines and platforms. Once written and compiled, a Java program should work just as well on any UNIX workstation as it does on an Apple Macintosh or Windows 95 personal computer. Therefore with Javas powerful Internet capabilities it is possible to write a program that can be downloaded from a server via the world-wide web and displayed as an applet on a Java-enabled browser on any machine. This has obvious huge advantages to the world of bioinformatics, a subject based on the distribution and analysis on many different types of machines of information derived from networked databases.
1.4 The design of bioWidgets.
For a bioWidget to be successful it has to be written in such a way that it does not try to model too specifically the data for which it was originally conceived. It should be capable of taking in data in a specified format and processing and presenting it in the way which is characteristic to that bioWidget. David Searls, in an early attempt to popularise biological componentry, said that a widget "should segregate the graphical nature of objects from the code concerned with their underlying meaning", (ref. 9). To illustrate this with the example of the Oxford Grid, the objects responsible for creating lines or rectangles on the screen should know nothing about chromosomes, only how they should be drawn. This generalises the program so that it can create a grid even when the data does not relate to chromosomes.
When this project began, the design strategy for bioWidgets was in a developing state. A draft API (application programming interface) based on the older version of Java (1.02) had been proposed and some guidelines had been issued. From this, it is highly apparent that one of the key attributes of a bioWidget is the separation of the program GUI from the application which creates an instance of the GUI. The GUI is the part of the program that the user interacts with, e.g. the window frame, the buttons and anything which is displayed on the screen. The responsibilities of the GUI also extend to processing any user input and processing information for presentation on the screen.
The advantage of separating the GUI from the application which runs it is that the internal implementation of either can be changed very easily without affecting the other. It has the same effect as drawing a black box around the devices, (encapsulation)- as long as the inputs and the outputs remain the same it does not matter what happens within. So for instance, a method could be replaced in the GUI that might have some novel processing effect on the data. This works well as long as the data outputs in the appropriate format. For encapsulation to work it is essential that all variables be defined as private to their own class and public accessor methods provided to set and read them- this ensures that they can not be manipulated by other classes which would corrupt the underlying componentary concept.
1.5 The packaging of bioWidgets.
Recently a Java sideline has been released called Java Beans. This is essentially a Java version of Visual Basic except the programmer manipulates beans instead of widgets. It is thought that this will promote the popularity of Java even further as it makes Java applications much easier to assemble. The market has lately become flooded with a multitude of different Java Beans IDEs each offering their own custom beans and functions.
Although it has not yet been done, the packaging and distribution of bioWidgets as Java Beans appears to be a practical idea. It would mean that researchers creating customised applications from bioWidgets would just need to drag and drop the desired beans on to the programming window and arrange them in the preferred way. This would also be a good way of promoting the use of bioWidgets and would most definitely add to the likelihood of success of bioWidgets. One reason that people tend to shy away from Java is due to its unfriendly windows toolkit, (AWT). Java Beans has the potential to overcome this.
1.6 Aims of project.
This project aims to encompass all that has been discussed in this section so far. The proposed generalised grid widget will be written, with the display of comparative mapping data in mind. This grid will be written as a bioWidget in Java and hence will be able to display many different types of data including data which was never considered during the design. For example the grid bioWidget should be able to display econometric statistics or train timetable information in some form as long as the data is available and can be packaged into the correct format.
To illustrate the train timetable example, one axis could represent trains departing London and the other representing trains departing Norwich on a particular day. The divisions along each axis could represent destinations from the particular city and highlighted within the cells could be homologous destinations. There could be more than one symbol in each cell (as there could be more than one train in a day to the same place) and the symbols could be placed within the cells according to what time the train arrived. This example may seem trivial but illustrates how generic such a grid has to be. It is intended that the bioWidget will eventually be packaged as a Java Bean.
In particular, the aims of the project are as follows: