A Business Plan Draft in Tourism Industry-File/Server and Data Lake

1.1 File/Server Architecture

We are thinking about Apache Tomcat and Apache HTTP.

The Apache HTTP

– is a powerful, flexible, HTTP/1.1 compliant web server(now with Apache HTTP 2.4)

– implements the latest protocols, including HTTP/1.1 (RFC2616)

– is highly configurable and extensible with third-party modules

– can be customized by writing ‘modules’ using the Apache module API

– provides full source code and comes with an unrestrictive license

– runs on Windows 2000, Netware 5.x and above, OS/2, and most versions of Unix, as well as several other operating systems

– is actively being developed

– encourages user feedback through new ideas, bug reports and patches

– implements many frequently requested features

Apache Tomcat

Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. The Java Servlet and JavaServer Pages specifications are developed under the Java Community Process.

Tomcat is a servlet container, you can have many servlets in a Tomcat instance. All JSP pages are compiled into a servlet. When we use Tomcat and the user request a resource in the server, the servlet container process the request, then it chooses what to do with the request. If the request has a valid URI, Tomcat get the resource and send the response to the client.

Comparison:

Apache HTTP is faster than Tomcat when serving static pages; Apache has more configuration options than Tomcat; Supports CGI scripts, Server API modules, Perl, PHP, etc.

Tomcat provides the Java Servlet and JSP support for dynamically served pages; it works as a light-weight testing servers; it can be run in different modes to promote better performance

1.2 Recovery/Continuity of Business

Content Distribution Network. A content delivery network or content distribution network (CDN) is a globally distributed network of proxy servers deployed in multiple data centers. The goal of a CDN is to serve content to end-users with high availability and high performance.

During the progress of development of our business, considering the data from end-users would contain large volume of photos, it will be necessary to consolidate CDN into our File/Server system.

2.1 Data Lake Architecture

2.1 Data Lake Architecture

We are thinking about Apache Hadoop and Microsoft Azure.

“The Apache Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time”, according to Tom Thomas, the student Lab Instructor from Rochester Institute of Technology.

The Pros of Hadoop include:

1. The storage system is robust, fast and flexible for multiple types of data.

2. Hadoop provides the feature of self-sustained and maintained nodes.

3. It works well with common hardware and systems.

The Cons of Hadoop include:

1. Multiple issues about stability, especially when it refers to function of open-source

2. Limited Support for users

3. Difficult to conduct interactive analytics

Microsoft Azure Data Lake

Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages.

Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications.

The Pros of Azure include:

1. Relatively to be scalable, especially with support from Microsoft.

2. Strong dashboard function – Preview Portal is a one stop dashboard for the infrastructure, application and financial elements.

The Cons of Azure include:

If we want to apply Azure Data Lake we have to apply the storage service within it. The price of this service is higher than Hadoop of course.

Screen Shot 2017-03-17 at 5.50.31 PM.png

2.2 Recovery/Continuity of Business

Content Distribution Network. A content delivery network or content distribution network (CDN) is a globally distributed network of proxy servers deployed in multiple data centers. The goal of a CDN is to serve content to end-users with high availability and high performance.

During the progress of development of our business, considering the data from end-users would contain large volume of photos, it will be necessary to consolidate CDN into our Data Lake.

Considering the flexibility we should pick Apache Hadoop. However, if we want to build CDN based on our database design and out existing options of data lake, we still need to choose Azure for higher stability and stronger technical support.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s