This file is part of the pdr/pdx project.
Copyright (C) 2010 Torsten Mueller, Bern, Switzerland
This program is free software: you can redistribute it and/or modify it
under the
terms of the GNU General Public License as published by the Free
Software Foundation, either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
pdr/pdx 0.3.4 - User Manual
pdr ("personal data recorder")
and pdx ("personal data expert")
are
free
applications
for
tracking,
managing
and
evaluation
of
personal,
mostly
numeric
data.
Contents
1. Introduction and basics
2. Glossar
3. Working principles
4. Invocation
5. Configuration
1. Introduction and basics
People work on computers, oftenly several hours a day.
Additionally almost everyone has a mobile phone in his computerless
time, a Palmtop or a similar mobile device. Personal data which accrue
over
the day could easily be evaluated if all these ways would be made
usable. The
user should be able to choose between the things that are at his
disposal. These things could be very different depending on his
location,
the day and daytime: on a
PC he can enter his data directly with pdr or send himself an e-mail,
perhaps using a command line tool like sendmail
or from inside an office application. With his mobile phone he can
also send e-mails or SMS. Maybe he uses measurement devices collecting
data in a private memory - later he can transmit them over USB,
Bluetooth, Infrared
or something else onto a computer. And perhaps he has already a
software producing data in a usable XML format. All these ways must
be equivalent and open.
The initial situation is based on the following assumptions:
- We have at least one convenient medium
for getting personal data on a computer and the effort to use it is
acceptable.
- Data input and data evaluation do not happen at the same time,
especially
not in real time. We get data (possibly much) more frequently than they
have to be evaluated.
- That's why data input must be fast, easy and mobile. This is the
most important criterion for acceptance.
- For data evaluation the time need is much less critical. There we
have criteria like capability, effectiveness and configurability.
- Data evaluation means the creation of static reports and
diagrams. There's no need for interactive work on the data.
Background: The initial idea was to log individual medical data (blood
sugar, blood pressure, body temperature, heart rate, weight and also
medication). Especially diabetics taking Insulin measure and collect a
lot
of such data every day, and it's very interesting for them (and for
physicians and specialists) to track and evaluate and comment them.
The applications are not specialized on medical use cases. You can use
them also for technical, sports, weather, environmental or financial
data,
for example for jogging distances and times or for the fuel consumption
of your car, the driven distances and the cost's. All what you
need is a continuous flow of numeric data.
2. Glossar
2.1. collections
The database is the connective link between pdr and pdx. Normally the
user has no need to bother about it's internal structure. Utilizing pdr
and pdx
he works almost exclusively with so called collections
(series of measurements). This is the concept: A collection
saves all values of a concrete series of measuring, each value together
with a
unique timestamp:
[...]
2008-12-17 21:45:00 5.9
2008-12-18 05:00:00 6.1
2008-12-18 12:45:00 5.3
2008-12-18 18:45:00 5.3
2008-12-18 21:45:00 4.7
2008-12-19 05:00:00 5.2
2008-12-19 12:45:00 5.4
2008-12-19 18:45:00 4.7
2008-12-19 21:45:00 5.7
[...]
If there are five parameters to get measured there are also five
collections
needed. The user can with
pdr list,
create and delete such
collections at any time.
Every collection has a unique name.
This
name
is
a
combination
from
the
following
characters:
A...Z
a...z
_
*
+
!
?
^
°
§
$
/
&
[
]
{
}
=
~
The name is case sensitive. The number of the collections and the
length of their names are unlimited.
Note:
Because of the use of these names in
expressions
what happens quite often the names of
collections should be short. There's no argument against the use of
single characters (especially letters).
Two collections have fix names: *
and #. The first one is
the so called default collection
which is always
numeric. The second one is
the comment collection which is text. These two collections
don't have to be created explicitly, they always do exist. The reason
for
this is their special (nameless) use in expressions. You should use
both these collections for the most important use case.
Collections have each a concrete type
for all of their data values. This type has to be declared during the
creation of a collection. Mixed collections are not thinkable at all.
There are three possible types of collections:
- numeric (floating
point
numbers
with
double
precision)
- ratio (a
pair
of
floating
point
numbers,
intended
for
blood
pressure)
- text (an
unlimited
character
string
for
comments)
2.2. rejections
If a database input doesn't comply with conditions, perhaps something
is misspelled or a date is invalid, this input is rejected. This means
it's data don't get into the valid data pool, not into collections.
However, they get into an own table and are not lost.
The idea is that these data can later be corrected interactively. This
is important especially for the input
per e-mail because e-mail mesages can't be corrected on the mail
server and it also doesn't make sense to leave them on the server. At
the moment only e-mail input will be rejected if needed, the other data
sources can be corrected as they are and do not need to be handled in
this way.
For handling rejections pdr has two special command line options.
2.3. expressions
During data input over e-mail
mailboxes, the command line
or text files pdr
interpretes so called expressions.
Every
text
line
is
an
expression.
An
expression
can
contain
several
values,
so
we have to declare which value should get
into
which collection.
For doing this we use a simple syntax - the name of the collection is
used as suffix:
[date]
[time]
(value[collection])* [; comment]
This definition means:
- the whole line is an expression
- date and time are optional, if there's no or only a partial
specification the current date or time will be used to complete the
input
- date and time are valid for the whole expression, all following
values and comments on this line get the same timestamp
- we have 0 or more value-collection-pairs, this means a value and
a collection
name without any space between, the collectionname is optional, if
there's no collection name the
default
collection is used
- optionally the text behind a semicolon until the end of the line
will be
interpreted as comment
Note:
if there are two values for the same collection in the same expression
only the latter will be used, there can be only one value per timestamp
in a collection because the timestamp is the unique key.
Date and time have a concrete, not localized syntax:
[CCYY-]MM-DD
and
hh:mm[:ss]
Examples
Given that we have the following collections in the database: l, m, n (all numeric) and anyway * and #. The following expressions
would be correct:
5.2
(implicit use of default
collection)
5.2*
(explicit use of default
collection)
5.2 8l 7n 1m
2009-08-16 12:34 5.3 9n ; this is
my comment
23:45 15l
; comment only
We see that simple data input is a primary design goal even if these
expressions seem to be a
bit cryptic on the first view. They (look at the first
three lines) can easily be entered also with the limited capabilities
of mobile phones. They have to be read again only by a machine. You can
also, if there's no other opportunity to transmit, put these data into
a text file or write them on a sheet of paper and enter them later.
2.4. selections
A selection is a part of a collection. This term is
specific for pdx and it's evaluations and calculations. You get a
selection by invoking the function
select or by a
calculation
returning a selection. The limitation of the collection in relation to
it's collection is always based
on time because a collection has only one dimension: time. The values
of a selection must not be continuous in time, a selection can contain
gaps. For example you can select the values of a collection for every
day of a month but only those between 8 and 9 o'clock.
3. Working principles
3.1. pdr
3.1.1. Functionality
pdr collects data from several data sources and puts them into collections of the database:
mobile phone ->
e-mail mailbox
...
\
measuring device -+-> pdr
-> database
... /
XML file
|
The database is the only link between pdr and pdx.
3.1.2. Data sources
and transactions
At the moment
there are the following types of data sources available:
|
these three data sources work
with expressions
|
|
these data sources work with
specific data formats in files
|
Most of the data sources (inputs)
have
to
be
configured. During the
invocation of pdr the data sources will then be requested in the
configured order in the
assumption that they have unprocessed data.
pdr uses transactions
to guarantee data integrity in the database as far as possible. These
transactions last from the invocation of the program (this means the
acceptance of the parameters) until the insertion of values into the
database. We truely want to exclude the case that data of a data source
are partly inserted into the database (and partly not). If a failure
occurs during processing the data should be corrected
outside the database (this means on the data source) and the processing
can be started again.
Note:
E-mail
data
are
different
(see
there).
Configured data sources are processed each in an own transaction.
Data sources specified on the command line will have each an own
transaction only if they are files. Expressions
specified on the command line are summed up and will be processed all
in one transaction.
3.1.3. Input per command
line
The simplest (and most uncomfortablest) way to get data into the system
is the pdr command line, this means the invocation
of
pdr. There's nothing needed to be configured for this.
pdr has the command line option -e (--expression) which allows to
specify an expression.
This option can be multiply used. Moreover all characters behind pdr that are not part or
argument of a command line option are summed up to one big expression
and processed at once (see there).
If an expression on the command line doesn't have a timestamp the
current
date and time will be used.
If there's a failure during processing because of any incorrectness in
an expression pdr produces a message. A data transfer into the rejections doesn't take place.
3.1.4. Input per mail (POP3)
For the use of e-mail mailboxes we assume that data (mails) have been
arrived in the mailbox and that they are not processed by any other
application.
These mails must have the following properties:
- a unique subject
- an exploitable timestamp (normally the SMTP server adds one
during sending)
- plain ASCII text format (no HTML, RTF ...)
- text completely in expressions
If there's an e-mail data source
configured the mail server will be requestet during the next
invocation. pdr looks if there are
mails on the server, checks their subject and processes matching
e-mails one by one, line by line, each line is an expression. If a line
has a timestamp this one has priority. Otherwise the timestamp of the
e-mail is valid implicitly. This is very handy because normally you
will never have to
enter a timestamp manually in usual, single line e-mails.
Here's a complete e-mail source:
From: Torsten
Mueller <Mymail@gmx.net>
To: MyMail@gmx.net
Subject: Q
Date: Thu, 04 Feb 2010 17:56:11 +0100
Message-ID: <87pr4ley8k.fsf@castor.ch>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
5.3 8i
Normally most of the values in the header lines are taken from lists. Date and Message-ID are added by the
server, MIME-Version and
Content-Type come from the
e-mail client application. The only remaining text parts that have
really to be entered are the subject (that's why it should be short, a
single letter here) and the contents of the message, the data line.
Processed e-mails are deleted from the server regardless of the
success. So they never get processed a second time. This deletion can
be suppressed by configuration.
If there's a failure during processing because of any incorrectness in
an expression pdr transfers these expressions into the rejections and writes out a
message.
3.1.5. Input per text file
If we use a text file for data input every line counts as expression. This method is practical if you
get data in a period without any opportunity to transmit them online.
So you
have to collect them in a file manually, expression by expression.
Lines starting with # are
not processed.
If there's a failure during processing because of any incorrectness in
an expression pdr produces a message. A data transfer into the rejections doesn't take place.
Text files that are processed successfully are deleted if they are
configured. So they are not processed a second time. This deletion can
be suppressed during configuration.
3.1.6. Input per CSV file
The abbreviation CSV means "comma
separated
values". Instead of the comma pdr also accepts the semicolon and
the tabulator as separator between the values.
There are two different ways to tell pdr what comma separated data
value should get into which collection:
- a control line in the CSV file preceding the data lines
- a control line in the configuration file, valid for the entire
CSV file
In the first case a pdr
CSV file would have the following structure:
control line
data line1
[...]
data lineN
control line
data line1
[...]
data lineN
[...]
This kind of use of control lines is unusually but gives us the
wanted flexibility and openness. Normally you can insert them easily by
hand
or by a program like sed.
In the second case the CSV file would contain only data lines as
expected.
A control line
has the following structure:
[#
pdr]
datetime
[separator
collection]+
Example:
#
pdr datetime, *, n, l; h; q»p, #
(» means a tabulator)
This is a control line for data lines with a timestamp and seven values
for the collections *, n, l, h, q, p and #.
Each control line in a CSV file will be
known on it's prefix # pdr,
a
control
line
in a configuration file doesn't need this prefix. The
following
keyword
datetime
marks the position of the timestamp on the data lines. It doesn't have
to be on the beginning but every line must have one - there are no data
values without a timestamp. In the example we can see that we can have
several separators on one data line. Data lines according to this
control
line whould look like this:
2008-10-11
12:31:38,
5.2,
7,
8;
42.3;
12»96, first measuring
2008-10-12 12:48:08, 6.1, , 8; 53.1; 16»93,
2008-10-13 12:43:57, 5.8, 7, 7; 34.2; 15»94,
third
measuring
The second line has no values for the collections n and #. In the case of missing
values just no inserts are made.
If you have CSV files containing more values than you want to import
into collections you can declare omissions in the control line:
#
pdr datetime, a, b, , , , c, d, e
Here we read a timestamp and two collections, then we omit three values
on the data lines and read again three values.
Lines starting with # are
not processed.
During the processing of a CSV file the whole file is handled in a
single transaction. If there's a failure because for instance a data
value on a line doesn't match the type of the declared collection the
whole file is dismissed. A data transfer into the rejections doesn't take place.
CSV files that are processed successfully are deleted if they are
configured. So they are not processed a second time. This deletion
can be suppressed during configuration.
3.1.7. Input per XML file
pdr can read XML files for data input. These files are well formed,
read- and editable, and are the ideal thing for data exchange between
different software systems.
pdr defines an own, intentional very simple format. But the responsible
part of the program is designed to be extended for further XML formats.
3.1.7.1. The pdr XML format
The pdr XML format is completely documented in the file pdr.xsd:
<?xml version="1.0"
encoding="iso-8859-1" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<xsd:annotation>
<xsd:documentation xml:lang="en">
pdr XML input file definition (C) T.M.
Bremgarten 2010-01-31
</xsd:documentation>
</xsd:annotation>
<xsd:element name="pdr">
<xsd:complexType>
<xsd:sequence>
<xsd:element
name="collection" type="collection" minOccurs="0" maxOccurs="unbounded"
/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="collection">
<xsd:sequence>
<xsd:element name="item"
minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:attribute name="datetime" type="xs:string" />
<xsd:attribute name="value" type="xs:string" />
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="name" type="xs:string" />
</xsd:complexType>
</xsd:schema>
This definition allows files that look like this:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<pdr>
<collection
name="#">
<item datetime="2001-07-09
18:27:11" value="first measuring"/>
<item date
time
="2001-
07
-10
07:52:01"
value="second
measuring"/>
<item date
time
="2001-
07
-10
10:07:00"
value="third
measuring"/>
[...]
</collection>
<collection
name="*">
<item date
time
="2001-
07
-12
13:57:01"
value="9.3"/>
<item date
time
="2001-
07
-12
14:46:45"
value="5.6"/>
<item date
time
="2001-
07
-12
18:25:36"
value="5.7"/>
[...]
</collection>
<collection
name="l">
<item date
time
="2001-
07
-03
21:41:58"
value="7"/>
<item date
time
="2001-
07
-04
21:48:43"
value="8"/>
<item date
time
="2001-
07
-05
21:50:49"
value="7"/>
[...]
</collection>
</pdr>
This format is self explaining. The data of the collections
are specified directly and well readable.
During the processing of a XML file the whole file is handled in a
single transaction. If there's a failure because for instance a data
value doesn't match the type of a collection the
whole file is dismissed. A data transfer into the rejections doesn't take place.
XML files that are processed successfully are deleted if they are
configured. So they are not processed a second time. This deletion
can be suppressed during configuration.
3.1.7.2 (more XML formats)
...
3.2. pdx
3.2.1. Functionality
pdx evaluates data from collections and
creates reports and diagrams by using statistic functions. Reports are
created from report templates
containing place holders which later are replaced by pdx. So it is
possible to create reports in almost every (text-)format, for instance
ASCII, XML, HTML, RTF, CSV, SQL and so on. With a little more effort
even a backend for ODF would be possible. For diagrams
at the moment only SVG will be supported. pdx works like this:
report
templates
reports
(HTML,
XML,
TXT)
\
/
Database -+-> pdx -+
/
\
diagram definitions
diagrams (SVG) |
The outputs must be configured. The
relevant groundwork for operating pdx
is the development of the report templates and the diagram definitions.
We need a bit of theory for this ...
3.2.2. Built-in functions
pdx has an extensive set of built-in functions
for selecting data from the database and for their statistic
evaluation. So the work gets programmable in a wide
range. The syntax is very similar to the functional programming
language
Lisp.
Note:
It's
not
a
real
Lisp-Interpreter.
Especially
there's
no
capability
to
define
new
functions
at all. But the processing of the functions is as in Lisp
strictly
functional. The reason for this Lisp-like syntax lies in the genial
simple structure of these notation which is immediately understandable
just by viewing without
learning complicated syntactical constructions.
Note:
A
list
of
all
built-in
functions
can
be
seen
in
interactive mode with the command
?. Most of these functions can
also be tested in interactive mode.
The following sections document all built-in functions with a unified
notation:
(
function_name function_parameter_type* ) -> result_type
This line means:
- every function prototype starts and ends with a round bracket
- every function has a name
- after the name follow 0 or more parameter types
- every function has also a result from a given type
The function name
must not be unique, it can be overloaded. Uniqueness must then be
realized by the given parameters
(number and types).
Some functions have an open
list of parameters: ...
(ellipse). This means that neither the number nor the types of the
parameters are known by definition. These functions can be called with
any parameters.
All functions are strictly(iest)
typed. The superlative means
that we even don't have type conversions. If you read {int} it really means {int} and nothing else, a {double}-value leads to an
error.
Types are written in
curly brackets to distinct them from values.
We have types in function definitions. In function calls they are
replaced by values. The
following types are possible:
{int},
{double} |
signed numbers
|
5, 3.14
|
{string} |
unlimited character strings
|
"Hugo"
|
{time} |
mostly a timeduration, seldomly really a time |
09:13
|
{timestamp} |
a concrete point in history with
date and time |
2009-12-31 7:30:01
|
{selection} |
a set of timestamp-value-pairs |
|
{color} |
a RGB-color in hexadecimal
notation
|
#00FF00
|
{nothing} |
only for function result types:
the result of
the function is empty and can not be evaluated
|
|
3.2.2.1. Date and time
functions
Whereever a value of type {time}
is needed one of the following functions can be used:
(year)
->
{time}
(year {int})
-> {time}
(years {int})
-> {time}
(month)
->
{time}
(month {int}) ->
{time}
(months {int}) ->
{time}
(week)
->
{time}
(week {int})
-> {time}
(weeks {int})
-> {time}
(day)
->
{time}
(day
{int}) -> {time}
(days {int})
-> {time}
(hour)
->
{time}
(hour {int})
-> {time}
(hours {int}) ->
{time}
(minute)
->
{time}
(minute {int}) ->
{time}
(minutes {int}) -> {time}
(second)
->
{time}
(second {int}) ->
{time}
(seconds {int}) -> {time}
These functions compute a timeduration with a wellknown length. The
names are self explaining. The {int}-argument
is
a
factor,
this
means
a
number
of
such
time
units.
Is there's no factor
specified the factor is 1.
Note:
A year and a month are problematic because they don't have fixed
lengths in reality. The pdx specifications (year) and (month) are per definitionem identically with (days 365) and (days 30).
The function now returns
the current date and the current time:
(now)
->
{timestamp}
Note:
if you use the
command line option -f
the returned value is the time of the start of the application even if
this lies now some seconds back. So all calls to
now return the same timestamp
if you use
-f.
Note: you can
define the returned value with the
command line option -n. With this you can (if
you have well designed report templates and diagram definitions)
produce reports for any time in history without any complicated
configuration.
Examples see next section.
3.2.2.2.
Functions for data selection
The following functions retrieve a part of exactly one collection and return this as {selection}. The {string}-parameter names the
collection, the other parameters limit the result by time:
(select
{string})
->
{selection}
(select {string}
{timestamp})
->
{selection}
(select {string} {timestamp}
{timestamp}) -> {selection}
(select {string}
{time})
->
{selection}
(select {string} {time}
{timestamp}) -> {selection}
The first implementation simply gets all data of the given collection.
This can take some time. The second one gets all data from the
specified timestamp on, the third one all data between the two
timestamps. The second timestamp, the end, doesn't belong to the
result. The fourth implementation gets all data in the specified
timeduration until now, this means what a call of now would return. The fifths
gets all data in the specified timeduration before the specified
timestamp.
Examples:
(select
"*")
get all data of the default collection
(select "*" 2009-12-01-12:34)
get data of the default collection
since Dec 01 2009 12:34, see Note
below!
(select "n" 2009-01-01-0:00
2010-01-01-0:00)
get all data of the collection n of the year 2009
(select "l" (weeks 2))
get all data of the collection l of the last two weeks
(select
"l"
(months
3)
2009-06-01-0:00)
get all data of the collection l of the last three months
before June
01 2009
Note:
In some cases we need timestamps as parameter. Normally timestamps are
written like CCYY-MM-DD
hh:mm[:ss] - with a space in the middle. But the space is here
also the separator for function parameters. So we need a - (minus) here
to tell pdx that this timestamp is one
parameter, not two.
3.2.2.3.
Statistic functions
All functions of this section do statistic calculations on a selection
and return again a selection with a calculated value on each line. For
example we
can calculate the daily average over a month.
The scheme of the parameters is identically in all statistic
functions:
- The implemention (a) applies the statistic function on the entire
selection. The result is a single line.
- The implemention (b) limits the data of the selection to the
daily time between the both times. Then it applies the statistic
function on the remaining data. The result is also a single line.
- All further implementations use a keyword naming an aggregation interval. This keyword
can be year, month, day, hour, minute, second. The result is a line
per interval. For instance if you use day you get a line for each day
that is contained in the initial selection.
- The implemention (c) applies the statistic function on all data
in
the aggregation interval.
- The implemention (d) does the same but allows the specification
of a time for the day change (only if you use day
). It is handy (especially
for medical use cases) to have the day change not at 0:00 but perhaps
at 2:00. Values with a time of 1:37 do then still belong to the
previous day which is more correct.
- The implemention (e) limits the application of the
statistic function again on values between the specified times.
- The functions avg
and sdv have each a sixth
implemention (f) which allows floating
calculations. Therefor we move a time window over the selection,
line
by line, in which the statistic calculation is made. You can use 5 and 5 to get a
floating average over the five preceding and the five following values
of each line in the given selection.
The folowing functions calculate the arithmetic average:
(avg
{selection})
->
{selection}
(a)
(avg
{selection}
{time}
{time})
->
{selection}
(b)
(avg {selection}
keyword)
->
{selection}
(c)
(avg {selection} keyword
{time}) ->
{selection} (d)
(avg {selection} keyword {time}
{time}) -> {selection} (e)
(avg {selection} {int}
{int})
-> {selection} (f)
Standard deviation:
(sdv
{selection})
->
{selection}
(sdv {selection} {time}
{time})
-> {selection}
(sdv {selection}
keyword)
->
{selection}
(sdv {selection} keyword
{time}) ->
{selection}
(sdv {selection} keyword {time}
{time}) -> {selection}
(sdv {selection} {int}
{int})
-> {selection}
Count:
(count
{selection})
->
{selection}
(count {selection} {time}
{time}) ->
{selection}
(count {selection}
keyword)
-> {selection}
(count {selection} keyword
{time}) -> {selection}
(count {selection} keyword {time}
{time}) -> {selection}
The count-function
does always return a line with a {double}-value
regardless
from
the
type
of
the
selection.
Arithmetic maximum and minimum:
(max
{selection})
->
{selection}
(max {selection} {time}
{time})
->
{selection}
(max {selection}
keyword)
->
{selection}
(max {selection} keyword
{time})
-> {selection}
(max {selection} keyword {time}
{time}) -> {selection}
(min
{selection})
->
{selection}
(min
{selection}
{time}
{time})
->
{selection}
(min {selection} keyword)
->
{selection}
(min {selection} keyword
{time})
-> {selection}
(min {selection} keyword {time}
{time}) -> {selection}
The functions max and min return a value which is
maximum or minimum in the selection and it's original timestamp.
Sum:
(sum
{selection})
->
{selection}
(sum {selection} {time} {time})
-> {selection}
(sum {selection} keyword)
->
{selection}
(sum {selection} keyword {time})
-> {selection}
(sum {selection} keyword {time}
{time}) -> {selection}
The first or last, this means the oldest or
youngest line of a selection:
(first
{selection})
->
{selection}
(first {selection} {time}
{time})
->
{selection}
(first {selection}
keyword)
->
{selection}
(first {selection} keyword
{time})
-> {selection}
(first {selection} keyword {time}
{time}) -> {selection}
(last
{selection})
->
{selection}
(last {selection} {time}
{time})
->
{selection}
(last {selection}
keyword)
->
{selection}
(last {selection} keyword
{time})
-> {selection}
(last {selection} keyword {time}
{time}) -> {selection}
The functions first and last return a value which is
the oldest or the youngest in the selection and it's original timestamp.
Note:
most of these functions are only allowed on selections with numeric
data.
count, first and last will always work. sum works also on selections
with text, for instance
for comments. These strings will then be concatenated. avg also works on selections
containing ratio-values.
In this case the calculation of the average is done separated for
numerator and denominator.
Examples:
(avg
(select
"*"))
compute the average over all values of
the default collection
(max (select "*") day 2:00)
get the daily maximum of the default
collection, assume day change at 2:00
(sum
(select
"n"
3:30
9:00)
day)
get the daily sum of values of the collection n, sum only values between 3:30
and 9:00
(avg
(select
"l"
(month))
5
5)
get the floating average over 11 values
of the collection l for the last month
(first
(select
"*"
(month)) day 2:00)
get the first line of each day of the
last monthfrom the default
collection
(last
(select
"*"
(day)) hour)
get the last line of each hour of the
last day from the default
collection
3.2.2.4.
Arithmetic functions
pdx has a small set of arithmetic
functions
namly the four basic arithmetic operations. They exist each in three
different implementions:
(X
{double} {double}) ->
{selection} with X = +, -, * or /
(X {selection} {double}) -> {selection}
(X {selection} {selection}) -> {selection}
The first one applies the operation simply on both numeric operands.
The
result is a single line. The second implemention applies the operation
on each line of the {selection}
and the
{double}-value. The result
has the same number of lines as the selection and the same timestamps.
These implementations are intended to be used especially for unit
conversions.
The third implementation is a bit more complex. It allows the line by line combination of two selections.
For
this
the
timestamps
are
compared.
The
numbers
of
lines
in
both
selections must not be equal. If the second selection doesn't have a
line with the same timestamp as in the first selection the last older
value will be taken. The result has as many lines as the first
selection:
selection
a
selection
b
(*
(select
"a")
(select
"b"))
--------------------
--------------------
-----------------------------
2009-11-17 12:38 9.3
2009-12-01 13:00
5.2
-> 5.2 * 9.3 =
2009-12-01
13:00 48.36
2009-12-02 13:00 5.7
->
5.7 * 9.3 = 2009-12-02 13:00 53.01
2009-12-03 13:00 3.2
->
3.2 * 9.3 = 2009-12-03 13:00 18.24
2009-12-03 19:17 8.4
2009-12-04 13:00
4.8
-> 4.8 * 8.4 =
2009-12-04
13:00 40.32
2009-12-05 13:00 5.7 2009-12-05
13:00 4.7 -> 5.7 * 4.7 =
2009-12-05 13:00 26.79
2009-12-06 13:00 5.3
-> 5.3 * 4.7 = 2009-12-06 13:00 30.21
The timestamps of the result are those from the first {selection}-parameter. As you
can see we need a valid line in the second selection also for the first
line in the first selection.
pdx reports an error if this condition is not fulfilled. - This
implemention
is very useful if you have two collections that are numerator and
denominator of a fraction, for instance specific fuel consumption per
distance.
3.2.2.5. Functions
for reports
The functions of this section are necessary for the creation of
reports. They return a string, oftenly a large block of text. pdx reads
the report template, finds a call of the format-function, evaluates
this immediately and replaces
this with the functions's result at the same position
in the text.
These functions can be tested in interactive mode, too.
The format-function is
complex even though it's prototype looks simple:
(format
...)
->
{string}
The function expects just a list of parameters consisting of text,
function results, format specifications and keywords. The result is a
piece of text with one or more lines. The best way to understand this
very central function is to take a look at examples.
Example 1:
(format
(avg (select "*" (days
7))) <1.2>
)
This call creates a single, formatted value. The first argument of the format-function
is
here the result of a
statistic calculation
(avg-function) which
returns
a single-line selection with a numeric value
on it. After this follows a format specification in angle brackets
which
means: the result should have at least one digit before and always two
digits after the decimal point.
It's much more complex if there are
multiple selections perhaps with multiple lines and different numbers
of lines. But now we learn to know the true thickness of the format-function.
Example 2:
(format
"<tr>"
"<td>" datetime
"</td>"
"<td>" (select "*" (days 7))
<1.1> "</td>"
"<td>" (select "n" (days 7))
<1> "</td>"
"<td>" (select "l" (days 7))
<1> "</td>"
"<td>" (select "m" (days 7))
<1.0> "</td>"
"<td>" (select "x" (days 7))
<1.1> "</td>"
"<td>" (select "#" (days 7))
"</td>"
"</tr>"
newline
)
This call creates HTML-rows for a HTML-table containing data from the collections
*, n, l, m, x and # of the last seven days. (The
table definition is not part of the function call.)
What you see at first sight is:
- there are alternating strings and selections
used as parameters
- all selections have the same timeduration, this is important
because they will be joined into an invisible, multi-column table using
the timestamps
- after the selections we see format specifications in angle
brackets, values of the default collection should get at least one
digit before and after the decimal point, values of the
collections n and l are shown as integers, values
of the collection m
should only have digits after the decimal point if these are different
from 0
- the position of the one remaining timestamp column in the table
is
defined by the keyword datetime,
if
we
wouldn't
use
it
here
the
resulting
table
wouldn't have timestamps
- the keyword newline
at the end means that
the format-function
should insert a line break after each line into the text
At first the format-function
analyses
the
selections
and
joins
the
hidden
table.
Then the
table is formatted line by line, value by value.
Note:
for
those
constructs
which
can
look
even
much
more
complicated it's
worth to write this neat and clean and to use spaces to arrange the
things that one can understand what's going on.
The result of such a functions call is real HTML and looks like this:
[...]
<tr><td>2009-01-17 18:58:13</td><td></td><td>6</td><td></td><td></td><td></td><td></td></tr>
<tr><td>2009-01-17 21:42:49</td><td>5.6</td><td></td><td>16</td><td></td><td></td><td></td></tr>
<tr><td>2009-01-18 05:54:41</td><td>6.8</td><td>7</td><td>8</td><td>1</td><td></td><td></td></tr>
<tr><td>2009-01-18 12:17:22</td><td>5.4</td><td>6</td><td></td><td>1</td><td></td><td></td></tr>
[...]
The number of the created lines depends on the selections alone. The
bold values come from the selections, everything else from the string
parameters of the format-function.
If
the
hidden
table
has
an
empty
value
on
one
line the format-function
does
not
create
any
output.
This leads to an empty field in the table: <td></td> which is
by the way oftenly not very well interpreted by many browsers.
It would be better to use <td><br></td>
for empty fields.
So occasionaly we have the problem to print something visible for empty
values. Using the empty-function
(empty {string}) ->
{string}
one can tell the format-function
what
string
to
print
instead
of
nothing.
Example:
(format
(empty "nil")
[...]
)
This will print nil for
every empty value.
The following small and parameterless functions are very simple:
(build)
->
{string}
(version)
->
{string}
(database)
->
{string}
build returns a string containing pdx-build-informations, while version returns
the actual version of pdx. These values can be included into reports to
show what version of pdx did create the report. database shows the version of
the current database.
3.2.2.6.
Functions for diagrams
The functions of this section create
diagrams. They don't return anything, they draw a diagram.
That's why they cannot be tested in interactive mode.
The diagram-function is a
container. All other diagram functions must be called as parameter of
the diagram-function
only. As expected diagram
has an open list of parameters:
(diagram
{int}
{int}
{color}
...)
->
{nothing}
The two {int}-parameters
are the wanted width and height of the diagram
in pixels.
Note:
these numbers do not include the labels on the axes,
only their inner area. The resulting diagram is indeed bigger
because of the labels. The reason for this is related to the difficult
computation of text sizes in
SVG graphics.
The third parameter is the background color of the diagram.
Example:
(diagram 400
300 #FDFDFD
[...]
)
The axes-function draws
an entire coordinate system
(1st quadrant):
(axes
{timestamp}
{timestamp}
{double}
{double}
{double}
{color})
->
{nothing}
(axes
{time}
{timestamp}
{double}
{double}
{double}
{color})
->
{nothing}
(axes
{time}
{double}
{double}
{double}
{color})
->
{nothing}
The first implemention draws the x-axis from the first to the last
timestamp, the second for the specified timeduration before the
timestamp, the third for the specified timeduration before now. The
step width of the labels on the
x-axis is calculated internally. The following three {double}-values are lower
limit, upper limit and step width on the y-axis. The
{color}-parameter names
the color of the axes and the labels.
Example:
(axes
2009-08-01-0:00 2009-09-01-0:00 2.0 10.0
1.0
#000000)
(axes (months
3)
7.0
15.0
0.5
#101010)
Using the hline-function
one can insert very handy
horizontal lines
into the diagram:
(hline
{double}
{color})
->
{nothing}
(hline {double} {double} {color})
-> {nothing}
The first {double}-parameter
is
the
position
of
the
line
on
the
y-axis,
the
optional second one is the thickness of this line. The color of the
line is the third, the
{color}-parameter.
Example:
(hline
7.0
0.25
#101010)
The last and most important diagram function is the curve-function. This function
draws a curve in different
styles. This curve is always based upon a selection
and has a color:
(curve
{selection}
{color}
...)
->
{nothing}
Without any further parameters the curve-function
draws
a
zigzag line
in the specified color just by connecting the data values of the
selection. With the help of additional parameters this behaviour can be
changed:
- The keyword bars
creates vertical bars instead of a zigzag line.
- A string like "+", "|", "-", "x" or "°" doesn't create a line
but single, unconnected points symbolized by the specified marker.
- A {double}-parameter
allows
to
change
the
thickness
of
the
(zigzag)
line.
In bar graphs one can draw multiple bars in one aggregation interval
using two {int}-parameters.
This
sounds
difficult
but
is
easy
to
understand
in
an example: Given
that we have values for four different, abstract day times, say "in the
morning", "at noon", "in the evening" and "late". And we want four bars
per day representing values at these times. In this case it would be
necessary to draw the bars that they don't overlay each other:
(curve
(sum
(select
"n"
(month
1))
day
3:30
9:30)
#FF1000
bars
1
4)
(curve (sum (select
"n" (month 1)) day 11:00 14:30)
#FF5000 bars 2 4)
(curve (sum (select
"n" (month 1)) day 17:30 20:30)
#FF9000 bars 3 4)
(curve (sum (select
"n" (month 1)) day 21:00 2:00)
#FFB000 bars 4 4)
These four lines differ in the selections, the colors of the bars and
in
the first {int}-parameter.
This
one
is
the
number
of
the
bar,
the
second
{int}-parameter
says
how
many
bars
we
have.
So the first line draws the first bar of
four.
pdx computes how wide a single bar must be drawn. In the example every
bar gets a quarter of the width of a day on the x-axis. You can play
with this. You must not draw every bar, this means you can also create
gaps between bars with this.
3.2.2.7. Other functions
pdx has exactly one function that is specific for diabetics: the HbA1c-function:
(HbA1c
{string})
->
{selection}
(HbA1c {string} {timestamp})
-> {selection}
(HbA1c {string} {timestamp}
{timestamp}) -> {selection}
(HbA1c {string} {time})
->
{selection}
(HbA1c {string} {time}
{timestamp}) -> {selection}
The function computes a HbA1c-value
in
percent
by
a
very
simple
approximation.
Therefor
you
need
blood sugar
values for at least three months,
this means if you want to draw a curve over a month you need data from
at least the last four months. The computed value is
definitivly not very exact but you can see the fluctuations in the
curve. The parameters are similar to the select-function. The first
implementation computes the value of today, the second the value
at the specified timestamp.
3.2.3. Interactive mode
pdx has an interactive mode. This mode is very useful for testing
function calls before you put them into report templates
or diagram definitions. And you can execute short queries, for instance
How many values are there in collection
x? or What was the all
time maximum? The interactive mode is startet by the command line option -i. pdx shows a prompt and waits for input:
$ pdx -i
pdx 0.3.1 (2010-01-03 16:43:29 on castor, GNU/Linux 2.6.32-ARCH x86_64)
>
At this prompt there are two instructions, ? and q, every other input will be
interpreted as function call.
The instruction ? lists
implementions of built-in functions.
Without
any
parameters
?
shows all known built-in functions with their parameter types and their
return type. ? accepts a
regular expression which can be used to shrink the result:
>
?min
show all implementions of the min-function
> ?a.* show all
functions beginning with a
> ?m.. show
functions beginning with m
and having two more characters
The instruction q
terminates the interactive mode and also pdx. This can also be achieved
by Ctrl-D
or Ctrl-C.
The call of a function shows the result immediately. A call like
>
(select
"*")
shows all actual values of the default collection.
3.2.4. All
together: the creation of reports and diagrams
Reports come into existence from report templates.
pdx searches through these templates to find a section with pdx
function
calls, mostly the
format-function. Such
sections are then cut out of the template, being evaluated and
then replaced by the result. A report template can have multiple
sections with
function calls.
Report templates are eather
plain ASCII text or text in a formatting language like HTML or XML or
text in a programming language like C or SQL. We call this a host
language.
What host language is being used we don't want to limit in any way.
But pdx must know how to find the sections with the function calls.
That's
why these sections are placed in comments of the host language, for
example in
HTML or XML between <!--
and -->, in C between /* and */. Doing so the template
remains
still an incomplete but correct file of it's type which still allows
the use of tools like
HTML- or XML-editors. It's wise to "mark" such
pdx-comments still a bit more to distinct them from other existing
comments in the file.
You could use indications like <!---
and ---> or /** and **/. A complete, small template
for a HTML template file containing pdx-instructions would look like
this:
<!DOCTYPE
HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html lang="de-ch">
<head>
<meta http-equiv="CONTENT-TYPE" content="text/html;
charset=iso-8859-1">
<title>MyTitle</title>
</head>
<body style="direction: ltr;" lang="de-DE">
<!--- (now) --->,
<b>pdx</b> <!---
(version) ---> (<!---
(build) --->)
*
</body>
</html>
In the line marked with *
we see three small pdx-sections each one with a call to one of the
functions now, version and build. The output this line
creates looks like this:
2009-12-27
15:14:51,
pdx 0.3.0 (2009-12-27 10:28:14 on castor,
GNU/Linux 2.6.31-ARCH x86_64)
One can see that all parts of the line that are not placed in
pdx-sections (and also the whole surrounding text) is transferred
without any change into the output. By the way this line could be
written also using the format-function:
<!---
(format
(now)
",
<b>pdx</b>
"
(version)
"
("
(build)
")")
--->
To fill a complete HTML-table with values we could write a template
file like
this:
[...]
<table style="page-break-before: avoid; page-break-inside: avoid;
width: 800px;"
border="1" cellpadding="1" cellspacing="1">
<tbody>
<tr
valign="top">
<td>Datum/Zeit</td>
<td>*</td>
<td>n</td>
<td>l</td>
<td>m</td>
<td>x</td>
<td>Kommentar</td>
</tr>
<!---
(format
(empty
"<br>")
"<tr
valign=top>"
"<td>"
datetime
"</td>"
"<td>"
(select
"*"
(days
7))
<1.1>
"</td>"
"<td>"
(select
"n"
(days
7))
<1>
"</td>"
"<td>"
(select
"l"
(days
7))
<1>
"</td>"
"<td>"
(select
"m"
(days
7))
<1.0>
"</td>"
"<td>"
(select
"x"
(days
7))
<1.1>
"</td>"
"<td>"
(select
"#"
(days
7))
"</td>"
"</tr>"
newline
)
--->
</tbody>
</table>
[...]
The table starts outside the pdx-section and also ends outside. But the
lines of the table (except the header) are generated completely by pdx.
These lines look all equal.
Diagrams are made from diagram
definition files. These files always contain exactly one
diagram definition, this means exactly one call to the diagram-function. In
diagram definition files comment indications are not necessary because
we have no
host language. A complete diagram definition would look like this:
(diagram
400
300
#FFFDFD
(axes (month
1) 3.0 9.0 1.0 #0)
(hline 5.0
#C0C0C0)
(hline 6.0
#C0C0C0)
(hline 7.0
#C0C0C0)
(curve (sum
(select "*" (month 1)) day 3:30 9:30)
#FF0000)
(curve (sum
(select "*" (month 1)) day 11:00 14:30) #00FF00)
(curve (sum
(select "*" (month 1)) day 17:30 20:30) #0000FF)
(curve (sum
(select "*" (month 1)) day 21:00 2:00) #FFFF00)
(curve (avg
(select "*" (month 1)) day
2:00) #0 1.0)
)
While developping new report templates or diagram definitions it would
be
wise to take existing files as base and to modify them step by step.
4. Invocation
4.1. pdr
pdr accepts options and
arguments.
Note:
Options can have arguments themselves. Don't mix this.
Options start with a minus character. A second minus indicates long
options
with
readable names.
4.1.1. Arguments
Everything that follows the program name
pdr on the command line
and does not begin with a minus counts as argument of pdr. All
arguments
are summed up to one expression and
are evaluated as one:
$
pdr 5.2 5n 8l -v \; this is comment up
to the end of the line
The resulting expression built from arguments is:
5.2 5n 8l ; this
is comment up to the end of the line
-v does not belong to the
resulting expression, it's a known option of pdr. The backslash in
front of the semicolon is special for
Unix-like operating systems. Their shell evaluates the semicolon itself
but we use it as comment delimiter. To avoid a conflict we must put a
backslash there. This backslash will be removed by the shell and will
truely not get into the input of pdr.
4.1.2. Options
-?
|
show a help screen
|
-V
|
show the pdr version
|
-v
|
show what is being done, without
this option, pdr shows
only errors
|
-c
filename
|
use filename as configuration file,
this option
superseedes the standard configuration file ~/.pdrx
|
-l
|
list all known collections
in the database and some statistics
|
-a
"name,type"
|
add a collection to the
database, the argument is a string,
containing name and type of the new collection, types
are n, r and t (for numeric, ratio or text)
|
-d name
|
delete a collection, the
argument is the name of
the collection
|
-D
|
delete all collections, the
collections * and # are not
deleted but will be cleared completely
|
-r
|
list all known rejections
|
-R
|
delete all current rejections
|
-e "expr"
|
evaluate an expression, the argument should be a
complete expression
|
-t
filename
|
import a text file into the database
|
-C
filename
|
import a CSV file into the database
|
-x
filename
|
import a XML file into the database
|
-n
|
do not use any of the configured
datasources,
use the command line only
|
4.1.2. Examples
First the options for handling collections.
-l or --list-collections shows all
known collections and some
statistic data:
$
pdr -l
name
type table recs
first
last
# text
C1 160 2008-11-25
18:45:00 2010-01-02 21:55:37
* numeric
C0 1636 2008-11-25
05:00:00 2010-01-03 12:10:00
h numeric
C6 1
2009-05-19 16:00:00 2009-05-19 16:00:00
l numeric
C3 707 2008-11-25
05:00:00 2010-01-03 06:26:01
m numeric
C4 612 2009-03-04
05:00:00 2010-01-03 06:26:01
n numeric
C2 1275 2008-11-25
05:00:00 2010-01-03 12:10:00
x numeric
C5 119 2009-03-22
09:28:09 2010-01-03 10:31:01
This table shows name and type of every collection, the physical
SQL table in the database, the number of records and the first and last
timestamps.
You can add a collection using -a or --add-collection:
$ pdr -a "k,n"
$ pdr -l
name type table
recs
first
last
# text
C1 160 2008-11-25
18:45:00 2010-01-02 21:55:37
* numeric
C0 1636 2008-11-25
05:00:00 2010-01-03 12:10:00
h numeric
C6 1
2009-05-19 16:00:00 2009-05-19 16:00:00
k
numeric
C7
0
l numeric
C3 707 2008-11-25
05:00:00 2010-01-03 06:26:01
m numeric
C4 612 2009-03-04
05:00:00 2010-01-03 06:26:01
n numeric
C2 1275 2008-11-25
05:00:00 2010-01-03 12:10:00
x numeric
C5 119 2009-03-22
09:28:09 2010-01-03 10:31:01
The argument contains the name of the new collection, a comma and then
the type written as n
(for numeric), r
(for ratio) or t
(for text).
A collection that is not needed anymore can be deleted using -d
or --delete-collection:
$
pdr -d k
You can delete all collections at once with -D or --delete-all-collections. After
this you will still have two remaining, empty collections, * and #:
$
pdr -D
$ pdr -l
name
type table recs
first
last
# text
C1 0
* numeric
C0 0
There are two options
for handling rejections. Using -r or --list-rejections rejected data
can be listed:
$
pdr -r
timestamp
expression
2010-01-03 17:46:20 12.0k
(error
here:
the
collection
k doesn't exist)
If you have rejections you should check the rejected expressions. If
you can "guess" the correct
expression, if you can remember, you can enter a correct expression on
the command line. After correcting all the rejections you can delete
them all at once:
$
pdr -R
The option -e or --expression allows the
specification of exactly one expression for
input:
$
pdr -e "5.2"
-e "2009-12-31 17:28:03 7.9"
This option can be used multiply. Then these expressions remain
independent.
The option -n or --none allows the abandonment
of all configured data sources. This is useful in the case of multiple
command line invocations in a short time, for instance for executing
expressions. Many mail servers don't want you to login many times in a
short time. Oftenly this is limited to a concrete number per minute.
Getting a POP3 connection also costs time. Using -n you work very privately on
your local database.
4.2. pdx
pdx has no arguments, just
options.
Options start with a minus character. A second minus indicates long
options
with
readable names.
4.2.1. Options
-?
|
show the help screen
|
-V
|
show the pdx version
|
-v
|
show what's going on, without
this option pdx
reports errors only
|
-c
filename
|
use filename as configuration file,
this option
superseedes the standard cofiguration file ~/.pdrx
|
-n
timestamp
|
define the value returned by the
function now
|
-i
|
start pdx in interactive mode
|
-f
|
run in fast mode
|
4.2.2. Examples
The option -n
or --now allows the
specification of the value returned by the function now. Everywhere now is called explicitly or
implicitly the new timestamp will be used
instead of the default one. This is very handy to create reports and
diagrams for a concrete time in history. A pre-condition for this is
that
the report templates and/or diagram definitions don't use fixed
timestamps themselves. Time specifications in the argument are
optional.
The specification is filled up with zeros - so the following
invocations
do the same:
$
pdx -n 2009-10-01
$ pdx -n 2009-10-01-00:00
$ pdx -n 2009-10-01-00:00:00
The
Option -f or
--fast improves the speed
of the program (probably a lot). This option causes internal results to
be cached and used again. This costs memory but accelerates the
program noticeable. You should use -f whenever possible and even
put this into the configuration file ~/.pdrx. This should not have
any impact on
the results.
5. Configuration
pdr and pdx need some configuration because their behaviour can be
affected in a wide range. All these settings are placed
in a local configuration file with the name ~/.pdrx. This file
has four sections:
- general options
- database options
- input options (pdr specific)
- output options (pdx specific)
The order of the sections or their rows in the file doesn't matter.
5.1. General options
In this section two things are to be configured:
- basic settings
- the inputs and the outputs and the order of their processing
Basic settings are similar to command line options. It doesn't make
sense to use every command line option here but some do indeed:
verbose
=
true
This line causes that both pdr and pdx are verbose.
Otherwise they'll report nothing but errors. Setting verbose = true is recommended.
fast
=
true
Let pdx always run in fast mode. (also recommended)
If you don't use pdx to create reports and diagrams,
just as database front end, you can let it run in interactive mode
generally:
interactive
=
true
The line
encoding
=
UTF-8
sets the default encoding which is used if we don't have any better
specification. This option is responsible for handling text correctly
(for instance comments, especially text with german umlauts). On a
modern system you will use UTF-8 or ISO-8859-1, depending on what your
shell is using. pdr allows ASCII,
UTF-8, UTF-16, ISO-8859-1, ISO-8859-15, Windows1252.
The configuration of inputs and outputs is important:
inputs
=
e-mail-mailbox,
file1,
file2,
file3
outputs = report1, diagram1,
diagram2, diagram3, diagram4
The first line defines inputs
for pdr, here four data sources,
namely e-mail-mailbox,
file1, file2 and file3, which are processed one
after the other in this order. What these names are is configured
later, see input options. The second
line names five outputs
for pdx in the same manner,
namely a report
und four diagrams. They are configured later, see output options.
5.2. Database options
Here we define everything related to the database.
5.2.1. SQLite
database.type
=
sqlite
database.connect =
~/local/share/my_data.db
The first line defines the database to be a SQLite database. The second
line contains the complete connection string to the database. In the
case of SQLite this is just the name of the database file. Because of
the applications are personal applications the database is intended to
be placed somewhere in the users home directory. The physical creation
of the database is not a task of pdr or
pdx. The user should do this using tools of the database or the
operating system himself. For SQLite this is simple:
$
cat > my_database.db (terminate
by Ctrl-D)
This command creates a 0 Byte file which can be used as empty database.
pdr creates the schema on the first call.
5.2.2. MySQL
database.type
=
mysql
database.connect
=
user=my_db_user_name;password=my_db_user_password;db=my_db_name;compress=true;auto-reconnect=true
The first line defines the database to be a MySQL database. The second
line contains the complete connection string to the database containing
key-value-pairs (the keys are bold). There are two preconditions:
- The database must exist, this means it has to be created by a
database administrator. He also gives it a name which is unique at the
database server, in example pdrx.
On
servers
used by several users it would be wise to create several user
specific databases distinctable by name.
- The user (a user of the database server, not of the operating
system) must exist and must have the right to create, delete, select
and manipulate tables.
5.3. Input options
If you want to query the same data sources again and again it will be
useful to configure them in the configuration file.
5.3.1. Configuration
of
a
POP3-mailbox
To configure a POP3-mailbox
(named e-mail-mailbox
here) you need the following settings:
e-mail-mailbox.type = pop3
e-mail-mailbox.server
=
pop.gmx.net
e-mail-mailbox.account =
MyAccount@gmx.net
e-mail-mailbox.password = MyPassword
e-mail-mailbox.subject = Q
e-mail-mailbox.keep = yes
The first line defines that e-mail-mailbox is a POP3 mailbox. The
next three lines are self explaining. The fourth line names the e-mail
subject which is used by
pdr
to identify the relevant
e-mails on the server. Only
mails containing this subject are processed, all others will just be
ignored.
This way you don't have to allocate a special new mailbox for pdr, you
can use an
existing one. Note: you must
enter this subject in every mail. If you send a lot of data mails you
have to enter this very often. You should use a short subject,
one letter, but it has to be unique. The last line determines whether a
processed e-mail should be deleted or not. The option accepts true,
false, yes and no. In the case of true or yes the e-mail is not deleted
on the server. If you don't use this option processed mails will be
deleted.
5.3.2.
Configuration of a text file
If you want to use a text file
for input you need the following configuration:
file1.type
=
txt
file1.filename = ~/my_file.txt
file1.encoding = ISO-8859-1
file1.keep = true
The first line defines file1
to be a text file input. The second line names the file, the third one
the encoding of the file. If you do not name an encoding here the
default encoding from the
general options will be used. The last line determines whether a
processed text file should be deleted or not. The option accepts true,
false, yes and no. In the case of true or yes the file is not deleted. If
you don't use this option processed files will be deleted.
filename allows the use of
wildcards (* and ?) to process a file with a not
completely known or frequently changing name or even an entire group of
files. The path must be complete but the file name can include
something like *.txt to
process all files of a directory at once.
5.3.3.
Configuration of a
CSV file
If you want to use a CSV file
for input you need the following configuration:
file2.type
=
csv
file2.filename =
~/my_file.csv
file2.encoding = ISO-8859-1
file2.ctrl_line = datetime, x, y, z
file2.keep
=
false
The first line defines file2
to be a CSV file input. The second line names the file, the third one
the encoding of the file. If you do not name an encoding here the
default
encoding from the general
options will be used. The option ctrl_line
specifies if needed a control line for the entire CSV file. Then the
CSV file itself doesn't have to contain a control line. The last line
determines whether a processed CSV
file should be deleted or not. The option accepts true,
false, yes and no. In the case of true or yes the file is not deleted. If
you don't use this option processed files will be deleted.
filename allows the use of
wildcards (* and ?)
to process a file with a not completely known or frequently changing
name or even an entire
group of files. The path must be complete but the file name can include
something like *.csv to
process all files of a directory at once.
5.3.4.
Configuration of a
XML file
If you want to use a XML file
for input you need the following configuration:
file3.type
=
xml
file3.filename = ~/my_file.xml
file3.keep = no
The first line defines file3
to be a XML file input. The second line names the file. we don't need
an encoding here, the XML file has it's own encoding specification. The
last line determines whether a processed XML
file should be deleted or not. The option accepts true,
false, yes and no. In the case of true or yes the file is not deleted. If
you don't use this option processed files will be deleted.
filename allows the use of
wildcards (* and ?)
to process a file with a not completely known or frequently changing
name or even an entire
group of files. The path must be complete but the file name can include
something like *.xml to
process all files of a directory at once.
5.4. Output options
5.4.1.
Configuration of a report
To configure a report you need the following settings:
report1.type
=
report
report1.comment_begin
=
"<!---"
report1.comment_end =
"--->"
report1.input_file
=
input/report1.html
report1.output_file =
output/report1.html
report1.encoding
=
ISO-8859-1
The first line defines report1
to be generated report. The next two lines declare the comment indications used by pdx to identify
code blocks
with function calls in the report template. The
fourth and fifth lines name input and output file. The last line
names the encoding of the created file, only needed for files that
don't
have an encoding specification inside.
5.4.2. Configuration
of a diagram
The configuration of a diagram is similar to the configuration of a
report but simpler:
diagram1.type = diagram
diagram1.input_file=input/diagram1.tmpl
diagram1.output_file=output/diagram1.svg
diagram1.encoding=ISO-8859-1
We don't have to specify comment indications.