|
..jtsdb..
Jtsdb is a java library intended for storing time series in relational
databases. It provides a somewhat utalitarian storage structure but the
benefit is that it is management free. Tsdb will create the desired
structure if necessary, index it and create views on tables for faster
non locking access. Your dba will probably hate it, but the upside is
that you have eliminated that dependency since you dont need them any more.
...download...
You can download it from
the sourceforge site.
...disclaimer...
Use Jtsdb at your peril. I am using it in production but that doesnt
mean you should. Refer to unit tests for an indication of what is tested,
how well it is tested and if that is good enough for you.
...status...
Jtsdb is incomplete, but core functionality is deemed to be working and
production ready. Download it build it and run the tests. Look at the
code and see if it is good enough for you. Any suggesetions are more than
welcome. Preferably contact me via the sourceforge account but you may
also email me on .
...applications...
If you want to store multiple time series data in a
relational database. When would you want to do this? Well not always - it
is certainly expensive with about a 10:1 ratio of storage space v data
stored. This is primarily due to costs of storing the indices so
arguably this can be traded off against preformance somewhat.
Still, RDBMS's offer ubiquitous access to data with ease. Data can participate in transactions. Querying with SQL allows for powerful manipulation of data. Other QOS can easily be applied such as high-availability, transactionality, logs, backups etc. Files cannot be easily shared and non-SQL proprietary engines are not as easy to access. If you want to store time series in an RDBMS then Jtsdb might be worth considering ...features...
Storage only. No attempt is made to support statistics or other
operations over time series. If you need a package for this, look at
CERN's colt or apache commons-math. Jtsdb is focused on storage of time
series into and retreival from an RDBMS. Even this is something you might
not want to do but if you do, then Jtsdb should be of help.
Database design. Each time series is split into two parts. The descriptor whcih lives in the ids table and values which live in a values table. The header is arbitrary but values consist of a tuple <id, timestamp, value>. The values table may be shared amongst time series or private to a time series. The sharing scheme is arbitrary and can be injected. Appropriate choice depends on your needs and RDBMS's strengths. All headers live in a single table. Flexibility. Many elements of the structure are configurable. Mainly, the number and names of columns in the header table are user defined, as are names of tables and views. Several ts databases can coexist in a single database instance/namespace. Asynchronous. Supports asynchronous operation with optional callbacks and synchronisation. All operations can be invoked in asynchronous mode, completion callbacks may be provided and will be called when the asynch operaion is completed. Supports bulk insert and updating of values where the targeted RDBMS supports it. Performance. It is very fast. Capable of performing ~6000 individual UPSERTs per second in standard mode, and around 60000 UPSERTS per second in bulk mode on fairly modest hardware on both Postgres/Linux and MSSQL/Windows. (Intel Core 2 Quad 2GHz, 8GB Ram). Scalability. Jtsdb leverages Spring JDBC Template for efficient access of large datasets, apache jdbc connection pooling for resource management and runs its own thread pool/service executor to parallelise access to data. Platform support. Jtsdb is RDBMS independent, via an sql dialect class injection. Implementations for MS SQL and Postgres exist but others are easily added. Look ma, no hands. No DBA required. Leave them out of the picture. They dont want to be near this stuff and you dont want them near it. All structures are self managed. The tables are created if they dont exist. Indices are added when appropriate. Views are created when necessary. Both tables and views are deleted when necessary. Significant amount of DDL is done in RDBMS specific manner and hence efficient. Matlab support. The library can be used from matlab to manage this database. Note that this class is fairly incomplete. Size. The problem is small and so is Jtsdb. It consists of ~10 classes and about 2000 lines of code. You could even embed it directly into your app rather than worry about packaging the jar file. ...conceptual overview.......rdbms components....
All time series definitions are held in the headers table. A definition
implies there may be values and these are then held in the desired
partition table. The header consists of five parts:
Primary Key, a single integer in the column named 'id'. This column is generated as an auto-incremented integer primary key and indexed. Business key section. This is a set of columns which are used to uniquely describe the timeseries within the universe. these columns are non-null and uniquely constrained. Any names may be used as long as they dont start with `class` or `descr`. At least one column is required. Classification. Optional, consisting of columns which may be used to classify time series in some application specific manner. These columns must start with the word 'class'. There are no uniqueness constraints but these may not be null. Description. Optional, description fields. No constraints apply. Partition. Mandatory. Contains name of the table which holds the time series values. Apart from the id, all columns in the header table are short strings.
Partition tables are simple in comparison. They consist of the int, timesamp and
doubles containing the values.
....java api....
The key concept on the api side is encapsulation of the header(s) using criteria. Criteria are simply
key/value pairs representing columns in the headers. Such criteria may identify none, one or several
time series. In practice, a where clause is constructed from the criteria and the headers table
queried with it.
In order to allow adaptation for new RDBMS engines with inevitable differences in the DDL and SQL standard conformance, SQLDialect
interface is provided which needs to be injected into the tsdb class to provide the relevant DDL and SQL
Partitioning scheme is also provided via an injected iterface, with the default implementation being the single partition for all ts values.
...to be implemented...not all values are doubles. perhaps cater for floats, ints to optimise storage requirements. in isolated partition schemes, ids arent required. cater for this as it trims down storage requirements ...tutorial...
...will be added here soon if there is interest, but seriously, the unit
test is worth giving a shot.
|