-
SciDocBrowser 2005-03-03
This work was done as a semester project at EPFL. This means that I have worked about 12 x 45 minutes per week during 14 weeks on it. The goal was to enable semantic browsing in a set of scientific documents. To achieve this, the java application (named SciDocBrowser) compute a semantic distance between the documents using a list of tags embedded in the documents. These tags define the topic of the documents. They are part of the ACM tree classification. A spring layout algorithm is used to place the nodes in the graphical view.
It has to be noted that the placement of the node is only defined by the tags, but no text is extracted from the documents to guess the subject of a document.
The project also asked to design a parser that extracts from a set of PDF files the ACM tags and build an XML library.
This work is mainly based on Paul Janececk's thesis, that uses the Corbis image database and Wordnet to compute the semantic distance between the images which have very few keywords to enable semantic browsing of the images..
Here follows some screenshots of the application.
I have retained from this work that the browsing using graph to visualize information was maybe not the most judicious things to do. This project was rather large and would have required much more time to design a really usable solution for users to navigate the document in an efficient manner. I have chosen a similar project for the next semester, which should allow me to dig deeper in the field of information visualisation. I have already thought of a concept of semantic magnets which could be explored during that project. But maybe a more conventional way of visualisation should be the way to go to achieve a really usable solution.
You can download my report here (10 Mo !).