Wille for Orange

Wille for Orange is a set of Widgets for developing data mining applications in based on the Orange library in Python (see http://www.ailab.si/orange).

example-orange-widgets-v001.jpg

Figure: Interactive Orange widgets for analysing data and visualisations.

Requirements

Suitable Python version (see http://www.python.org/)

Orange library (with dependencies, see http://www.ailab.si/orange)

Wille 2 (see this site for downloads)

Note: The current Wille for Orange widgets were designed for the Orange 1.0. (See the installation instructions below for a more complete list of requirements.)

Introduction

In short, the WilleTools widgets enable adding custom program logic, service requests, rich content (e.g. XML) management, and third-party applications to Orange applications. The motivation is using Orange Canvas as an exploratory, visual programming environment in developing relatively general-purpose data processing, semantic computing, and visualisation applications. Note that since Orange Canvas is not designed as a service platform (and does not scale as such), it is expected that mature applications are designed so that they can be re-factored in terms of the underlying Orange and Wille libraries, to be deployed (perhaps as service components) without the visual Canvas application.

The design of Wille Orange widgets has followed the idea of scripting. Further, it is assumed that developers of scripts know what they are doing. As a consequence, Wille widgets in principle support processing and transmitting arbitrary Python objects. While this allows implementing nearly anything within the Canvas application, the drawback is that the signal type checking paradigm of Orange Canvas has been weakened within the WilleTools widgets. This means that Script widgets can be connected in the Canvas even if they do not support each others signal types. In most cases, however, Wille widgets behave as expected.

For purpose of working with folder-like objects, typically required in complex pipeline applications, many of the Wille2 widgets are based on an utility class called wille.zipblob.Zipblob, as an application of the common Python Zipfile object. This allows manipulating and transmitting collections of data as compressed ZIP files. For compatibility reasons, using Unicode file names is not recommended. Further, due to the limitations of different ZIP versions, a Zipfile can only include data up to 2GB. However, it should be possible to pack/unpack Zipblobs using common ZIP tools.

Important security note: It is expected that developers share the Orange Canvas schemas (and perhaps new widgets) they have created. In this setting, introducing widgets capable of executing arbitrary Python (and therefore other) programs raises an obvious security risk. For this reason, the Wille Orange widgets with potentially harmful computing capabilities are clearly marked with a little bomb icon bomb.jpg (a small bomb in a white frame on top right corner of the widget icon). However, this is merely a remainder that downloading 3rd party schemas including related widget code may potentially include malware. Thus, when using Wille Orange Canvas widget extensions, never run untrusted canvas schemas! Obviously, the developers of the Wille Orange widgets take no responsibility whatsoever from damage resulting from running malicious code from an untrusted source. (Of course, widgets may also contain other errors/caveats that open additional, undocumented security threats.)

The Figure in the sidebar illustrates the end-user view to the simple Wille2 Orange application. In practice, the user is able to access full details of the data, and visually navigate within the visualisation interactively. Other kinds of applications may be easily constructed, with the help of built-in Orange data mining and visualisation widgets, and the customised Wille2 widgets.

Installation

Installation procedure of Wille2 for Orange Canvas is straightforward (details may vary slightly upon operating system, installation instructions written for Wille 2.0 Bundle):
  1. If necessary (e.g. not already installed), install Python 2.5(+) (see http://www.python.org/download/). An alternative is to use, e.g., ActivePython? (see http://www.activestate.com/activepython/).
  2. If necessary, install the latest copy of the Orange Canvas (see http://www.ailab.si/orange/downloads.asp) and the Python libraries it requires (Python version 2.5 was used during Wille development). In many cases, this can be achieved in a single step with an appropriate Orange bundle.
  3. Download a copy of the Wille distribution archive (see http://www.tut.fi/hypermedia/en/publications/software/wille/). It includes three directories, wille, WilleTools and icons. (The two latter directories can be found in the apps/orange directory.) The first is the general-purpose Wille2 Python library, the second is the Wille2 Orange widget library, and the third includes the icons of the Wille2 widgets.
    1. If necessary, copy the directory wille (i.e. including the directory itself, not just the contents) into the site-packages directory of your Python installation. (In many cases, the destination folder looks something like ...\Python\Lib\site-packages .)
    2. Copy the directory WilleTools (i.e. including the directory itself, not just the contents) into the OrangeWidgets directory of your Python Orange Canvas installation. (In many cases, the destination folder looks something like ...\Python\Lib\site-packages\orange\OrangeWidgets .)
    3. Copy the contents of the directory icons (i.e. just the files, not creating the directory) into the icons directory of your Python Orange Canvas installation. (In many cases, the destination folder looks something like ...\Python\Lib\site-packages\orange\OrangeWidgets\icons .)
  4. Install the Python libraries required by some of the Wille2 widgets: 4Suite XML tools (see http://4suite.org/?xslt=downloads.xslt or http://sourceforge.net/projects/foursuite/). Note: In some platforms (including Windows), using 4Suite requires running Python (and thus Orange Canvas) in a case-sensitive imports mode (in Windows, this may done by setting set PYTHONCASEOK=).
  5. Install any additional (Python or other) libraries and components required by your scripts.
  6. Finally, registering the new WilleTools widgets is required. This is done within the Orange Canvas application: Launch the Orange Canvas application and choose Options/Rebuild Widget Registry from the menu. The WilleTools tab appears now in the Canvas application.
The above procedure needs to be done once per installation. If you have several python systems installed, be careful in avoiding mismatching installations.

Once the everything has been successfully installed, the Python Orange Canvas application is run as any Python GUI application. (Note, however, that using 4Suite may require setting the PYTHONCASEOK environment variable. Since many python applications do not require this, it may be convenient to launch the Canvas application via a shell script, setting the variable only for the session.)

We shall next introduce the new Wille2 widgets, now accessible from the Orange WilleTools tab.

Wille2 Widgets

The Wille2 Widgets include the Script Widget, the URLRead Widget, the ZipblobMerge Widget, the XSLT Widget, and the Browser Widget. We shall next introduce these from application developers' perspective.

Script Widget

Script.png The Wille Script widget allows adding arbitrary Python scripts to Orange applications.

Additionally required Python libraries: By default none, i.e. depends on the script.

Script can both input and output arbitrary data as an Python object. In addition, Scripts may output native Orange types, including ExampleTable, AttributeList, and SymMatrix. This allows using Scripts for not only ad hoc computing, but also for gluing different things together. Note that Scripts are the Swiss Army Knives of Wille Orange: When a script becomes generally useful, it is usually not too difficult to use it as a basis for implementing a new kind of widget. (For writing Python scripts, see http://www.python.org/doc/ .)

Controls of the Script dialog include:
  • Controls displaying the script name, status, and error information.
  • A field for using external script from a file, instead of the provided script edit box. (Usually it makes sense to include Scripts in the edit box since this includes the script when performing the Canvas File/Save operation.)
  • A checkbox for stating that a script is runnable.
  • A checkbox for setting the script to be run automatically, when new input is signaled.
  • A button to execute the script manually (using the last stored input).
A Script receives its input Data signal via the global input variable. It signals output data via the global output dictionary using output signal types as keys (i.e. Data, Examples, Attributes, or Distances). Note that in order to simplify script development, Scripts accept only a single input. When multiple input signals are required, using ZipblobMerge widgets is recommended.

The following simple Script example accepts a Zipblob object that contains an appropriate XML table data file, and outputs the data from the file as an Orange ExampleTable (and can hence be connected with other built-in Orange widgets). The comments are not necessary but may help other users in using and modifying the script.
######################################################################
# XMLTable to Orange ExampleTable Converter
#
# Input:  wille.Zipblod.read()able data wrapped using the 
#         wille.zipblob.Wrapper class (e.g. )
# Output: Orange ExampleTable
#
# Arguments: INFILE: Name of the source XML file in the input Zipblob

INFILE = "table.xml"

#
# Input example:
#<table>
#  <tr>
#    <th>Item</th>
#    <th>Weight</th>
#  </tr>
#  <tr>
#    <td>Duck</td> <td>3.5</td> 
#  </tr>
#  <tr>
#    <td>Witch</td> <td>3</td> 
#  </tr>
#</table>
#
# The format resembles simple HTML tables. However, it
# is assumed that the first row introduces the variables, 
# the other rows include the instances, and that 
# each row includes the same number of cells.
#
# Script by Ossi, 2009
######################################################################


######################################################################
# Do not make modifications below unless you want to edit the script.

__title__ = 'XMLTable to ExampleTable Converter'

import orange
from wille import zipblob
import xml.dom.minidom 

global input, output # Every Wille2 IO script includes these two lines:
output = dict()      # Important! Forgetting to include this gives a hard-to-track type error!

b = zipblob.Zipblob()
b.write(input.get_value()) # Input Zipblob data from the global input variable

dir = b.check_out()
doc = xml.dom.minidom.parse(str(dir) +'/' + INFILE)
b.check_in()

root = doc.documentElement
trlist = root.getElementsByTagName("tr")

# Create orange table
vars = []
vartypes = []

if len(trlist)==0: 
  output["Examples"] = None
else: 
  first = 1
  for tr in trlist:
    if first == 1:
      first = 0
      thlist = tr.getElementsByTagName("th")
      for th in thlist:
        type = th.getAttribute("class")
        name = th.firstChild.nodeValue
        if type == "FloatVariable":
          vars.append(orange.FloatVariable(str(name)))
          vartypes.append("float")
          print float
        if type == "StringVariable":
          vars.append(orange.StringVariable(str(name)))
          vartypes.append("string")
      domain = orange.Domain(vars)
      data = orange.ExampleTable(domain)
    else:
      tdlist = tr.getElementsByTagName("td")
      instance = []  
      ind = 0
      for td in tdlist:
        value = td.firstChild.nodeValue
        if vartypes[ind] == "float":
          instance.append(float(value))
        if vartypes[ind] == "string":
          instance.append(str(value))
        ind = ind + 1
      ex = orange.Example(domain, instance) 
      data.append(ex)

output["Examples"] = data  # Every Wille2 script includes this line. Now output data using the Examples signal type as ExampleTable.

The next simple example Script creates a Zipblob object and signals it forward:
from wille import zipblob

__title__ = "Create Zipblob example"

global input, output
output = dict()

b = zipblob.Zipblob()
b.add_file('somefile.txt', 'file.txt', 'a') # add file somefile.txt as file.txt

output["Data"] = b.wrapper_read() # Important: Pass Zipblob data via Wrapper class, not as such (for garbage collection!)

For simplicity, the editor/debugging capabilities of Scripts are very limited. It might thus be useful to develop and test complex scripts using appropriate external development framework, and simply copy/paste them into Wille Orange Scripts when ready.

Developer note: Since Orange Canvas seems to exploit signals for internal bookkeeping, e.g., passing flat string data is may in general be not safe and may yield strange error messages. Arbitrary data can be wrapped into, e.g., classes that seem to work fine. (The wille.zipblob.Wrapper class is provided for this purpose. This allows doing output["Data"] = zipblob.Wrapper("some text") ... input.get_value().) However, the drawback is that in some application/platform combinations, Python garbage collection does not send __del__ to classes passed using signals in Orange Canvas. This means that classes requesting system resources might not be able to release resources as expected. In particular, if signalled data is not properly wrapped, this may yield into both resource leaking and a security risks in a worst-case scenario. Of course, this may also happen when a program exists unexpectedly (e.g. crashes). For instance, if signalled as such a wille.zipblob.Zipblob object may leave processed zip files (e.g. 1234567_zipblob.zip) in the user's temporary system folders. For this reason, the proper way of passing Zipblobs is via wrappings. In other words, instead of passing a Zipblob object directly, it's data is passed via a Wrapper object provided by Zipblob.wrapper_read(). The drawback in this is that instead of passing files, the data gets signalled via the computer (virtual) memory.

Security warning bomb.jpg: Scripts can implement genuine Python computer programs. Thus, Scripts of untrusted origin may include malware!

URLRead Widget

URLRead.png The Wille URLRead widget allows reading data from a HTTP server.

Additionally required Python libraries: wille.

Controls of the URLRead dialog include:
  • Controls displaying the URLRead status and invoke/error information.
  • Three fields for inputting the URL address of the service, and username/password, if required.
  • A checkbox for setting the URLRead to be executed automatically, when new input is signaled.
  • A button to invoke the URLRead request manually (using the last stored input).
  • A radio button for HTTP request method type (GET or POST).
URLRead receives its input via input Data signal, typically prepared by a Script. This data should be a dict() of parameters wrapped using the wille.zipblob.Wrapper class. The following Script example demonstrates creating parameters:
from wille import zipblob
import urllib

global input, output
output = dict()

params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
output["Data"] = zipblob.Wrapper(params)

If present and properly formed, the input data will be passed as parameters to the service.

URLRead outputs the data it receives from the service using the Data signal. The data is packed as a file inside a zipblob.Zipblob class whose data is then wrapped using the wille.zipblob.Wrapper class.

Actual data is stored in a single file, called results.

If the service returns a single file, the file results may be useful as such. However, depending on the requested service, the output data may require additional parsing (e.g. when multipart or zipped documents were returned).

In any case, the result from the service includes the data returned by the standard read() method requested from an object instance (derived from a class of type) of type urllib.FancyURLopener. In other words, the results file includes data that conceptually includes the bytes from the output_from_service variable of the following informative example:

import urllib
opener = urllib.FancyURLopener({})
f = opener.open("http://www.python.org/")
output_from_service = f.read()

It is the task of the next widget, typically a Script widget, to process the output data from URLRead. However, several other Wille Orange widgets by default work with wrapped Zipfile objects.

Developer note: Wille2 library includes more classes and method for invoking and publishing services. This functionality may be exploited using Wille Scripts. Future Wille version may include Orange widget representations of these.

ZipblobMerge Widget

ZipblobMerge.png The Wille URLRead widget allows listing the contents of a Zipblob file, and merging Zipblobs.

Additionally required Python libraries: wille.

Controls of the ZipblobMerge dialog include:
  • Controls displaying the status and information.
  • Contents listing, showing the files and file sizes.
Note that all controls are informative only -- the widget is completely operated via the Orange Canvas interface, by connecting and disconnecting signals.

URLRead receives its input via input wrapped Zipblob Data signal(s), and returns output(s) via wrapped Zipblob data signals. If inputs Zipblobs include files with identical names, the files will be repeated in the merged Zipblob, due to its Python Zipfile functionality.

XSLT Widget

XSLT.png The Wille XSLT widget allows transforming XML data with Extensible Stylesheet Language Transformations (version 1.0; see http://www.w3.org/TR/xslt). This allows, e.g., data transformations, visualisation transformations, and user interface transformations.

Additionally required Python libraries: wille, Ft.Xml (from the 4Suite XML Tools).

Controls of the XSLT dialog include:
  • Controls displaying the XSLT status and error information.
  • Two fields for inputting the relative name of the source file, and the relative name of the output file of the transform (within the Zipblob)
  • A checkbox for setting the URLRead to be executed automatically, when new input is signaled.
  • A field for using external transformation from a file, instead of the provided edit box.
  • A checkbox for stating that XSLT code is runnable.
  • A checkbox for setting the transformation to be executed automatically, when new input is signaled.
  • A button to execute the transformation manually (using the last stored input).
The following simple transformation example demonstrates creating a HTML 4 document based on the given XML source (with root element called data, including item elements with textual content).
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"
              indent="yes"
              encoding="iso-8859-1"
              doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN" />

  <xsl:template match="/data">
    <html lang="en">
      <head>
        <title>Example</title>
      </head>      
      <body>     
        <h1>List of items</h1>      
        <ol>
          <xsl:apply-templates/>
        </ol>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="item">
    <li>
      <xsl:value-of select="."/>
    </li>          
  </xsl:template>
  
</xsl:stylesheet>

Security warning bomb.jpg: In principle, XSLT 1 transformations may read any files the user has access to (and call XSLT-processor-specif extension functions). Further, the XSLT widget supports writing relative filenames. However, since global filenames are useful in some applications, using them is not disabled.

Browser Widget

Browser.png The Wille Browser widget allows opening a file from a Zipblob in the default browser. In some operating systems it is also possible to open the Zipblob as a file folder.

By default, the folder is placed in the (system-depended) temp file directory of the user. The name of this folder changes with different pipeline executions. To enable viewing the contents of the Zipfile, the temporary directory is maintained until the Browser widget dialog is closed manually. Note: If the dialog is not closed manually, the Zipfile directory is exists also after exiting the Orange Canvas application!

It is also possible to force the data into a fixed directory, specified in the Browser Widget dialog. In this case, the same directory will be overwritten as new data signals are available. (This is particularly useful e.g., in refreshing and publishing content to external applications such as in visualisation interface testing or file service deployment.)

Additionally required Python libraries: wille.

Controls of the Browser dialog include:
  • Text field for information.
  • A checkbox for using a fixed base output directory. (For safety reasons, data will be (over)written to a subfolder called output.)

  • A button to activate a find directory dialog.
  • A button to manually deploy received data in the fixed output directory.
  • A checkbox to for deploying data automatically.
  • Input text field for specifying the file inside the Zipblob. (If empty, opening the folder is attempted.)
  • A checkbox for using the fixed directory instead of the temporal one.
  • A checkbox to for launching the default browser automatically.
  • A button to launch the default browser.
Security warning bomb.jpg: When used in a complex pipeline, it is in principle possible to use the Browser widget for opening harmful applications and/or sending local user data to outsiders. Further, by specifying a fixed output directory, it is possible to overwrite use files in some output directory.

Developer tip: When used with a fileserver application, the fixed directory option provides a simple way to distribute outputs of pipelines within a developer group etc.

Discussion

The Wille Orange Widget library aims including the minimal set of useful tools for creating data transformations and visualisations. In particular, several kinds of applications can be implemented using the Script widget.

It is anticipated that the widgets are also useful for building, e.g., Semantic Web applications. Again, with the help of the Script Widget, it is in principle easy to integrate with, say, Resource Description Framework (RDF) data.

For instance, assuming that the Redland (see http://librdf.org/) RDF library with Python bindings is installed, making semantic queries is simple. The following piece of code demonstrates creating an Orange ExampleTable from the results of a SPARQL (select) query (http://www.w3.org/TR/rdf-sparql-query/), as implemented by the Redland query processor:

import RDF
import orange

model = RDF.Model()
model.load('file:data.xml'); # Perhaps read from multiple Zipfiles, due several URLReads, integrated with ZipblobMerge...
qtxt = "select $uri $title where { $a a <http://purl.org/rss/1.0/item>. $uri <http://purl.org/rss/1.0/title> $title} limit 5"
q = RDF.Query(qtxt, query_language="sparql")
  
results = q.execute(model) # Should really be prepared for catching exceptions ...

data = None
vars = []
i = 0  # Tuplet id (not used below)        
if len(results)>0:
    # make orange string variables
    for result in results:
        for k in result.keys():
            if k.startswith("f_"): # make continuous (float) variable
                vars.append(orange.FloatVariable(k))                        
            else:
                vars.append(orange.StringVariable(k))
        break
    # make domain based on the above variables
    domain = orange.Domain(vars)
    # make data object
    data = orange.ExampleTable(domain)
    # compile and append individual data examples    
    for result in results:
        i = i + 1
        instance = []
        for k in result.keys():
            if k.startswith("f_"): # make continuous (float) variable
                try: # no value if conversion fails!
                    f = float(str(result[k]))
                except Exception, (errValue):
                    f = ""
                instance.append(f)                        
            else:
                instance.append(str(result[k]))
        ex = orange.Example(domain, instance) 
        data.append(ex)

# Got i tuples now held in an ExampleTable.
# Could be passed, e.g., to some Orange a widget or method

Besides data (semantic) data transformations, the Wille2 Orange widget library support visualisation transformations. Perhaps the simplest way of creating user interfaces is generating graphical end user interfaces with Scripts or XSL Transformations in HTML/AJAX/X3D/SVG/... technologies within the default Browser. More complex applications typically benefit from adopting some Dashboard programming schema.

This also simplifies the socio-technical challenge of persuading users to install required additional plugins etc. In many cases, suitable Browser extensions may already be available, or at least installation and add-on management is relatively simple, following the familiar Browser plugin management use case.
Topic attachments
I Attachment Action Size Date Who Comment
Browser.pngpng Browser.png manage 1.4 K 14 May 2010 - 15:33 OssiNykanen  
Script.pngpng Script.png manage 1.3 K 14 May 2010 - 15:33 OssiNykanen  
URLRead.pngpng URLRead.png manage 1.3 K 14 May 2010 - 15:33 OssiNykanen  
XSLT.pngpng XSLT.png manage 1.4 K 14 May 2010 - 15:33 OssiNykanen  
ZipblobMerge.pngpng ZipblobMerge.png manage 1.1 K 14 May 2010 - 15:33 OssiNykanen  
bomb.jpgjpg bomb.jpg manage 1.0 K 14 May 2010 - 15:36 OssiNykanen  
example-orange-widgets-v001.jpgjpg example-orange-widgets-v001.jpg manage 200.2 K 14 May 2010 - 12:03 OssiNykanen  
wille-2.1-for-orange.zipzip wille-2.1-for-orange.zip manage 22.4 K 14 May 2010 - 15:43 JaakkoSalonen  
Print version |  PDF  | History: r12 < r11 < r10 < r9 | 
Topic revision: r12 - 24 Feb 2011 - 12:38:33 - OssiNykanen
 

TUTWiki

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TUTWiki? Send feedback