Friday, September 4, 2015

Pentaho-biserver 5.3 features

Pentaho

Pentaho is a suite of open source Business Intelligence (BI) products which provide data integration, OLAP services, reporting, dashboarding, data mining and ETL capabilities. The Pentaho suite consists of two offerings, an enterprise and community edition. Pentaho's core offering is frequently enhanced by add-on products, usually in the form of plug-ins, from the company itself and also the broader community of users and enthusiasts.
You can learn more about Pentaho at http://www.pentaho.com/

Pragmatic Pentaho Business Analytics Platform (BA Platform 5.3)

Pragmatic Pentaho BI is a pre-configured, secured, optimized and ready to run image for running Pentaho BI on Amazon EC2 in a production mode. Pentaho BI is a very popular Business Intelligence Platform of the Pentaho Suite of applications. Commonly referred to as the BI Platform, and recently renamed Business Analytics Platform (BA Platform), makes up the core software piece that hosts content created both in the server itself through plug-ins or files published to the server from the desktop applications. It includes features for managing security, running reports, displaying dashboards, report bursting, scripted business rules, OLAP analysis and scheduling out of the box.
Commercial plug-ins from Pentaho expand out-of-the-box features. A few open-source plug-in projects also expand capabilities of the server. The Pentaho BA Platform runs in the Apache Java Application Server. It can be embedded into other Java Application Servers.

Pentaho Business Analytics 5.3 - The Feature List

Pentaho 5.3 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data. Highlights include new Analyzer APIs and documentation, Redshift and Impala improvements, Hadoop clusters and Hadoop distribution support, better support for high load Carte environments, and some minor functionality improvements around Pentaho Interactive Reports and Hadoop steps and entries.
Pentaho BA 5.3 improvements will help you work with Analyzer APIs, explore the Streamlined Data Refinery, and set up multi-tenancy with Pentaho Business Analytics.
Pentaho BA 5.3 improvements will help you work with Analyzer APIs, explore the Streamlined Data Refinery, and set up multi-tenancy with Pentaho Business Analytics.

BA Server / Plugin Improvements - Overview

  • Data Access - 0 bad builds during ALL 5.3.0 RC builds (CE)
  • Analyzer JS API and Documentation (EE)
  • PIR Improvements (EE)

  • ○ Design & Runtime row-limits and ability to schedule when hit
    ○ Toolbar button when embedded
    ○ Performance Improvements

New Analyzer APIs & Documentation Updates

With 5.3 comes a new set of APIs to provide more control over Analyzer when working in an embedded fashion. These APIs allow for more fine-grained interaction with the Analyzer reports and data. The Analyzer extensibility APIs will live in a single place, and include introductory material, as well as samples.

Multi-Tenancy in Pentaho 5.3

Pentaho has three categories of multi-tenancy that is achieved with 5.3
  1. Data multi-tenancy allows developers and integrators to apply custom security and business rules to control access to data.
  2. Content multi-tenancy separates content, such as reports and folders, among tenants.
  3. UI multi-tenancy presents different styles of the user interface for each tenant.

There are two required components to make multi-tenancy work. Users need to be associated with tenants via roles, tenant IDs, or other identifiers which indicate what content and data users see. Similarly, there must be something in the data that can be used to restrict access. The combination of user information and data make the multi-tenancy approaches described here possible. Since these approaches are data model and data-driven, they are very flexible.
The most common category of multi-tenancy is data multi-tenancy. Data multi-tenancy allows developers to apply their own custom data access rules at runtime. For example, each tenant might only be allowed to see data which is associated with their tenant ID. Here are the most common methods for data multi-tenancy in Pentaho Business Analytics.
  1. Sharding: Each tenant has its own database or schema. This approach has the advantage of controlling per database and ensuring data is separated. Note that with this approach, multiple databases and servers will need to be managed.
  2. Striping: Tenants share a database, but the tables have a tenant ID column to indicate which tenant can see the specified data. This approach has the advantage of managing only a single database. Note that with this approach, databases can become very large.
  3. Data Models: Tenancy is controlled at the data level where different tenants (or sub-tenants) are only able to see certain data. This approach is very flexible, but the data to restrict on must usually be known in advance.
  4. Hybrid: Combinations of sharding, striping, and data model. Each of the approaches above can be combined into a single, flexible solution to data multi-tenancy.

Updated Streamlined Data Refinery

Pentaho 5.3 has updated the Streamlined Data Refinery to improve the modeling process, added security and data source improvements, and added support for Amazon Redshift and Cloudera Impala.
Features of interest
  • Working with the Streamlined Data Refinery
  • Build Model step
  • Annotate Stream step
  • Publish Model step

Other updates with Pentaho 5.3

Here we cover changes to the software that might impact your upgrade or migration experience. If you are migrating from an earlier version than 5.2, these pointers would be of great help!
Manually Migrating Big Data Cluster Configurations Stored in Hadoop Steps and Entries
If you are migrating or upgrading to PDI 5.3 or greater, and you have transformations or jobs that use the following Big Data steps or entries, you might need to convert the existing cluster configuration information to use the Hadoop Clusters feature.
  • HBase Input
  • HBase Output
  • Pentaho Map Reduce
  • Oozie Job Exec
  • Hadoop Job Exec
  • Pig Script Exec
You only need to perform the conversion process if you edit one of the above steps or entries in Spoon. Otherwise, you do not need to complete the conversion process. Note that you can continue to run scheduled transformations and jobs without the conversion, as long as you do not manually edit one of the above steps or entries.

Interactive Reports Performance Improvements
Pentaho 5.3 has implemented a number of performance improvements for Pentaho Interactive Reports (PIR), including the ability for system administrators to set system-wide maximum row limit, a way to extend PIR to show toolbar buttons, and incorporated the query-metadata collection capabilities of Pentaho Report Designer (PRD) into PIR.

System-wide Row Limit
Pentaho 5.3 comes incorporated with the capability to set a system-wide row limit for Pentaho Interactive Reports (PIR). Users will not be able to override this row-limit once you have set it, although they will have the option of setting their own, smaller, row-limit through the query settings. This will improve performance when the returned data set is fairly large, and also adds the ability to run the full report in the background.

Show Repository Buttons Feature
Extended PIR so that you can show the buttons that interact with the repository in the PIR toolbar. This will allow you to embed PIR without having to use a third-party tool to hook the callbacks on. This function is triggered by passing a parameter on the URL of the PIR plugin.

Query-Metadata Collection
The changed PIR to take advantage of the query metadata collection improvements for Report Designer in Pentaho 5.2. This query metadata feature will improve the design time of Interactive reports.

No comments:

Post a Comment