After IBM researchers delivered the first data warehouse in the late 1980s, businesses looked forward to finally being able to store critical data in easy-to-find, centralized locations. Employees at all levels would be able to tap that rich data to make decisions based on concrete, analytical facts instead of gathering scattered information from different sources or using plain intuition.
Like many sweeping technology promises, the vision sounded grand, but sadly didn't become the reality for many companies throughout the 1990s. The problem, however, was never the lack of capabilities with the technology. Rather, big commercial data warehouses were so expensive that they largely remained the luxury of very big organizations with the budgets to buy the systems and the staff to implement and maintain them. Aside from the steep cost, some of these data warehouses had critics who claimed the systems delivered big IT headaches, with little return on investment.
Data warehousing, however, is changing quickly to meet the demands of companies with large volumes of data that require fast answers to complex, unpredictable questions. What's providing the answers today - in a more affordable, simpler way - is the two-word IT revolution called open source, which is providing the building blocks required to create a whole new data warehouse.
There are many benefits to an open source data warehouse. It costs less to support and maintain because the products are more affordable than commercially licensed products, plus it's relatively easy, when hiring, to find the skills required to deploy an open source data warehouse - so you won't have to scour the industry for staff with a specific IT expertise. In addition, rather than have to go through a lengthy and expensive trial process, open source provides immediate, free-of-charge access for evaluation through a simple download.
Best of all, your company won't be locked into a costly proprietary software upgrade path.
The Road to Open Source
Open source isn't new, of course. When the Internet took flight in the mid-1990s, Linux sparked a free software movement that today supports everything from operating systems to application servers to middleware and databases.
Now, companies that have traditionally relied on commercial databases are turning to open source. Walk into many Fortune 500 firms and you will increasingly find open source installed alongside traditional commercial databases. Indeed, one study of 226 members of the Independent Oracle Users Group (IOUG) found that 35 percent of these commercial users had also installed an open source database such as MySQL.
The use of open source DBMS engines has spiked, too, in recent years, according to market researcher Gartner Group. The analyst firm found that 47 percent of companies it surveyed have already adopted an open source database, and another 19 percent are considering investing in a solution within a 12-month period.
The Warehouse Problem Solved
So why is open source a particularly smart strategy for the data warehouse? Given enough time and money, corporate IT departments can develop a system perfectly designed to answer any question quickly - that is as long as they know the question. The problem is that business people cannot know in advance all of the questions they will need answers to in the future.
Plus, many are using traditional, proprietary databases that aren't designed to handle complex analytic queries against billions of rows of data. To answer even simple questions typically requires time-consuming retooling, creating indexes, partitioning the data and re-indexing the database.
With this backdrop it is only natural that the flexibility of open source would make its way into the data warehouse market.
The movement started with vendors building proprietary data warehouse products based on open source databases such as MySQL, PostgreSQL and Ingres. Development of open source databases progressed into full-fledged open source data warehouse solutions and communities built around those solutions. Our open source community - www.infobright.org - provides one such resource, alongside a host of other open source developer/user business intelligence communities including those of Talend, Jaspersoft and Pentaho.
Today, even the extract, transform and load (ETL) tools that support database management systems - offered by vendors like Pentaho, Talend and Octopus - are going open source (About 11 percent of the companies Gartner Group recently surveyed are using open source ETL tools, with another 16 percent considering such tools over the coming months).
Despite the success of open source, companies still debate its merits. But building an argument for the use of open source in the data warehouse in a market where IT budgets are shrinking and the demand for information is increasing is pretty straightforward. It's also growing in strength, thanks to the open source community.
Here's our case:
- Open source makes sense economically. Open source means no more strict licensing fees and a low- to no-cost software model. With open source there's no need to assign a big crew to install and maintain the software and because the entry costs are so low, even the smallest companies can sign on. Open source licenses also allow companies expand to an unlimited number of users, eliminating the typical per-user or per-processor charges of proprietary software packages.
- The community sets the standard. A public community of developers works together to improve existing products. The community nature of open source has helped create software that meets the highest IT standards and, as new code and features are contributed to the community, there are new options for end users. The community approach particularly suits the database environment, where many systems and data types need to be integrated and it's difficult for a single vendor to offer solutions for every integration problem. Companies can turn to the community for help with everything from fixing bugs to addressing security flaws instead of waiting weeks or months for the next security patch or service pack from a vendor.
- Full-featured software without the bloat. With traditional commercial vendors, software packages are stuffed with too many features - bells and whistles that users don't want or really need. These packages can't help but be complicated because they are designed for big companies. Today, open source solutions provide the rich functionality, quality and scalability that users need as the community provides direct input into what is important for the product roadmap. With open source, your company also downloads what it will use and installs what it needs. New features can always be added later. Ultimately, it's the users who are in the driver's seat, not the vendor.
Making the Leap
Still, there's a lot to consider before making a decision about open source. For many large companies, an open source database won't replace the proprietary enterprise data warehouses. Instead, an open source data warehouse can serve more tactical purposes within your business - functions that complement or fill needs that cannot be quickly, efficiently or cost-effectively met by the enterprise system.
After your company decides to move ahead with the project, it's time to consider what you want in a vendor. With any candidate your company is considering, make sure there is a viable support model available - typically by a company with an accompanying committed community that stands by the product and can demonstrate success by other users. The vendor and the community will provide the knowledge and experience that will guide your company as you move ahead with your installation and future upgrades.
Additionally, make sure you are investing in a company making products that truly support open source. Some vendors develop proprietary products based on open source databases such as PostgreSQL and other products, but they lack the benefits of a true open source solution and community.
Also, make sure the front end of the applications you choose are simple enough for everyday end users. Data from the warehouse should work seamlessly in real-time with front-end applications, requiring little input from the workers who lack the technical background to handle IT complexity. A successful installation means it's simple to use for these workers, while simultaneously supporting the needs of your "power users," who will build the company's more complex queries.
Be sure to investigate newer technologies specifically designed for analytics that offer powerful data compression features and reduced hardware footprint, resulting in lower server and storage costs, less space, as well as reduced power and cooling costs. Otherwise, maintenance costs may balloon to eventually match those of a proprietary data warehouse.
Features that we have made a priority with our data warehouse products include a column-based architecture with a high-speed data loader, very high rates of data compression, and technology that automatically manages and tunes the database so DBAs and administrators don't need to.
With the arrival of open source in the data warehouse space and the current economic challenges, companies face many choices. Fortunately, the argument for introducing open source technology in your company's data warehouse strategy has benefits that can't be denied: it's more affordable, standards based, upgrade friendly and employs an effective community-based approach.
With so many benefits, it might be time to make the leap.
About the Author
Miriam Tuerk became CEO of Infobright in January of 2006. Prior to joining Infobright, Miriam spent 11 years with BCE Emergis and its predecessors, Newstar Technologies and Bryker Data Systems. Starting as a sales executive, she finished her tenure at Emergis as President of the eBusiness division. Miriam's 20 years of experience includes work in both the consulting and telecommunications sectors covering Canadian as well as the U.S., European, and Asian markets.