venky: July 2011

Web Mail Filter
Abstract

Contents

Introduction

Purpose

Scope

Overview

System Analysis

Existing System

Proposed System

Advantages over Existing System

Feasibility Study

Economic Feasibility

Operational Feasibility

Technical Feasibility

Functional Feasibility

Software Requirement Specification

Software Requirements

Hardware Requirements

System Design

Architecture Diagram

Authentication

Functional Description

Functions

Maintenance

Functional Description

Functions

E - R Diagrams

UML Diagrams

Activity Diagram

Class Diagram

Use-case Diagram

Sequence Diagram

Component Diagram

Deployment Diagram

Coding & Implementation

Technologies Used

HTML & JavaScript, XML

Java / .NET Technology

Database Tool / SQL

Webserver / Application Server

Snapshots

Reports

User Groups

Testing

Methodologies

Test Cases

Limitations / Scope for Future Enhancements

Conclusion / Project Summary

Bibliography / References / Appendices

Abstract

WEB MAIL FILTERS Filter for mails is an efficient and popular communication medium. Like every powerful medium it is prone to be misused. One such case of misuse is the posting of unsolicited emails known as spam.

Spam messages are typically sent using bulk mailers and address lists obtained from web pages and newsgroup archives. They vary significantly in content, from vacation deals to real estate to pornography. Since these mails are unsolicited they are the least important to the reader and most of the people would prefer not to have their “inbox” cluttered with junk mail. In some cases Spam may cause harm as pornographic content could be seen by underage users or worse yet it might contain malicious viruses. Spam is not only annoying but also time consuming as it could take a while to sort through the messages and delete the ones that are unsolicited and worthless.

It is a customizable application so that the user can enter the Spam mail filter specifications like email id from which he doesn’t want any mails, the subject of the mail with which no mails should be received, and any keywords in the mail which should not be present.

Introduction

Purpose

At present most of the people are using Internet all over the world in their daily life. One of the main reasons is it provides fastest communication between the people through Emails. We can receive a no. of mails from our friends, relatives, customers etc. When we receive a large no. of mails from others, there will be some problems may occur.

Scope

WEB MAIL FILTERS for mails is one way to reduce the impact of the problem on the individual user. In its simplest form, this filter is a mechanism for classifying message as either spam or non-spam. This project is for the employees working in an organization.

The major tasks of WEB MAIL FILTERS for mails are as follows:

Ø Avoiding Self-Mails.

Ø Checking the mails against UBL (Universal Block List) IP addresses.

Ø Checking the mails against Viruses and keywords specified by the Administrator.

Ø Checking the mails against Subject, From Email ID, and IP address specified by the user.

Ø Checking the mails against the Virus definitions and keywords specified by the User.

Ø Checking the mails against the size limit of the mail specified by the User.

Ø Deleting the mails, which are detected as spam.

Ø Showing day-wise report to the users on how many mails came, how many are filtered by the Email address, and how many are filtered by the subject, etc.

Ø Showing non-spam mails to the user.

Overview

WEB MAIL FILTERS for mails is a fast spam filter that works by learning to recognize the words that are more likely to appear as spam than non-spam. It uses a set of tokens, which are stored in a database, that were taken from the users. When any mail arrives it checks the mail against the tokens specified by the user to whom it comes. If they match then it mark the mail as “spam”, later this mail will be sent to a location where all spam mails are stored. This process segregates spam and non-spam mails and passes them to appropriate locations.

The Mail Filter project consists of three modules:

1) Admin

2) User

3) Filtering

Admin:

This module deals with the management of the application by a person called Administrator who is responsible for taking all the mails coming to the people working in the organization, and invoking the filtering mechanism. He can also specify UBL IP addresses and Universal Virus definitions in order to protect the systems from attack.

User:

This module deals with user processing. In this module user can specify what he doesn’t want, such as the mails coming with a specific subject like “free offer”, the mails coming from the specific Email Ids, and the mails coming from the systems with specific IP address etc. The user can also specify virus definitions or keywords to prevent the mails containing those viruses and keywords. If the user wants only those mails whose size is below some limit, then the user can specify the size limit for the mails. Moreover, the user can also edit or delete the filters that are previously specified.

In this module the user can also see the report on the mails came to him every day. This report includes how many mails came to him, out of which how many mails are filtered by the Email addresses, how many are filtered by the subjects, and how many are filtered by the virus definitions etc. In addition to these the user can also see how many non-spam mails are there and he can read those mails.

Filtering:

This module deals with the major task of Mail Filter. It checks the incoming mails for each user against the filters specified by the administrator and by the users. If any match found, then it treats that mail as “spam” and moves that mail to another location. It also stores the details such as Email address, Subject, Date and size of all the mails, both spam and non-spam, in the database for future reference.

System Analysis

Existing System

1. Too noisy Inbox

Many Organizations and companies are receiving large no. of mails everyday. So, it is difficult to view and process every mail.

2. Misleading information distribution:

In addition to the mails, which are useful to us, we may also receive some unwanted mails like advertisements. They may contain misleading information such as special offers provided by the companies. Such mails are called spam mails and they may kill most of our valuable time.

3. Virus spreading programs and Mails:

Sometimes mails that we have received may contain harmful Virus coding which may affect our system partially or as a whole.

4. Wastage of allocated memory space for Inbox.

As every mail occupies some memory space, the allocated memory for our Inbox will be wasted when spam mails are being increased.

In order to avoid all these problems we can use filter software, which is known as Mail Filter.

Proposed System

Ø Logging in to the Mail Filter.

Ø Adding new filter specifications for Email ID (From address of the mail), size, Subject, Virus definitions etc.

Ø Editing previously specified filters.

Ø Deleting the existing filter specifications.

Ø Viewing reports on the filtering of the mails.

Ø Viewing non-spam mails.

Ø Changing password.

Ø Logging out from the Mail Filter.

Advantages over Existing System

Administrator Features:

Ø Logging in to the Mail Filter.

Ø Adding new UBL IP address Filters.

Ø Adding new Universal Virus definitions.

Ø Listing all the users of the Mail Filter.

Ø Creating new users for the Mail Filter.

Ø Deleting the existing users of the Mail Filter.

Ø Initiating the Filtering process.

Ø Changing the Administrator password.

Ø Changing the User passwords.

Ø Logging out from the Mail Filter.

Feasibility Study

Economic Feasibility

Economic feasibility attempts 2 weigh the costs of developing and implementing a new system, against the benefits that would accrue from having the new system in place. This feasibility study gives the top management the economic justification for the new system.

A simple economic analysis which gives the actual comparison of costs and benefits are much more meaningful in this case. In addition, this proves to be a useful point of reference to compare actual costs as the project progresses. There could be various types of intangible benefits on account of automation. These could include increased customer satisfaction, improvement in product quality better decision making timeliness of information, expediting activities, improved accuracy of operations, better documentation and record keeping, faster retrieval of information, better employee morale.

Operational Feasibility

Proposed project is beneficial only if it can be turned into information systems that will meet the organizations operating requirements. Simply stated, this test of feasibility asks if the system will work when it is developed and installed. Are there major barriers to Implementation? Here are questions that will help test the operational feasibility of a project:

Is there sufficient support for the project from management from users? If the current system is well liked and used to the extent that persons will not be able to see reasons for change, there may be resistance.

Are the current business methods acceptable to the user? If they are not, Users may welcome a change that will bring about a more operational and useful systems.

Have the user been involved in the planning and development of the project?

Early involvement reduces the chances of resistance to the system and in general and increases the likelihood of successful project.

Since the proposed system was to help reduce the hardships encountered. In the existing manual system, the new system was considered to be operational feasible.

Technical Feasibility

Evaluating the technical feasibility is the trickiest part of a feasibility study. This is because, .at this point in time, not too many detailed design of the system, making it difficult to access issues like performance, costs on (on account of the kind of technology to be deployed) etc. A number of issues have to be considered while doing a technical analysis.

Understand the different technologies involved in the proposed system before commencing the project we have to be very clear about what are the technologies that are to be required for the development of the new system. Find out whether the organization currently possesses the required technologies. Is the required technology available with the organization?

Software Requirement Specification

Software Requirements

Operating System : Windows XP/2003 or Linux/Solaris

User Interface : HTML, CSS

Client-side Scripting : JavaScript

Programming Language : Java

Web Applications : JDBC, JNDI, Servlets, JSP

IDE/Workbench : My Eclipse

Database : Oracle/Access

Server Deployment : Tomcat 5.0

Hardware Requirements

Processor : Pentium IV

Hard Disk : 40GB

RAM : 256MB

System Design

Architecture Diagram

Authentication

Functional Description

Functions

Maintenance

Functional Description

1. Problem recognition: Initially we studied the gathered requirements and understood the key requirements in the system context.

2. Evaluation and sysnthesis: Here all the externally observable data objects are defined, flow ad content of the information are evaluated, all software contents are defined and elaborated, the software behavior is understood, system interface characteristics are established and additional design constraints are uncovered.

3. Modeling: The software model serves as the foundation for software design and as the basis for the creation of specification for the software.

4. Specification: It is viewed as a representation process. Requirements are represented in a manner that ultimately leads to successful software implementation.

5. Review: The review is first conducted at the macroscopic level i.e. it is ensured that the representation is complete, consistent ad accurate and the overall information, functional and behavioral domains are considered. Extreme care is taken in conducting the review.

In requirements analysis phase we check whether the

1. Requirements are consistent overall objective of the system.

2. All the requirements have been specified at the proper level of abstraction.

3. Each requirement is bounded and unambiguous.

4. Any of the requirements conflict with the other requirements.

5. Each requirement is testable.

Functions

ü Problem recognition

ü Evaluation and sysnthesis

ü Modeling

ü Specification

ü Review

E - R Diagrams

Data Dictionary

Login:

Field name	Data type	Key Type
Username	Varchar2(20)	Primary key
Password	Varchar2(20)	Not null
Email_ID	Varchar2(20)	Unique, Not null

UBL:

Field name	Data type	Key Type
IPaddress	Varchar2(20)	Primary key
Reg_Date	Date	Not null

Universal_Virus:

Field name	Data type	Key Type
Virus_name	Varchar2(20)	Primary key
Virus_def	Varchar2(20)	Not null
Reg_Date	Date	Not null

Mailbox:

Field name	Data type	Key Type
Mail_File	Varchar2(20)	Primary key
Email_ID	Varchar2(40)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
Subject	Varchar2(20)	Not null
From_IP_Address	Varchar2(15)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null

Filter_From:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
IPaddress	Varchar2(40)	Primary key
Reg_Date	Date	Not null

Filter_Subject:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
Subject	Varchar2(20)	Not null
Reg_Date	Date	Not null

Filter_IP:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_IP_Address	Varchar2(15)	Not null
Reg_Date	Date	Not null

Filter_Virus:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
Virus_name	Varchar2(20)	Primary key
Virus_def	Varchar2(20)	Not null
Reg_Date	Date	Not null

Filter_Size:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
Mail_Size	Number	Not null
Reg_Date	Date	Not null

Reports:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
Reg_Date	Date	Not null
Total_Mails	Number(3)	Not null
Unfiltered	Number(3)	Not null
Filtered_By_Self	Number(3)	Not null
Filtered_By_From	Number(3)	Not null
Filtered_By_Subject	Number(3)	Not null
Filtered_By_Virus	Number(3)	Not null
Filtered_By_IP	Number(3)	Not null
Filtered_By_Size	Number(3)	Not null

Filtered_By_Virus:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null
Virus_name	Varchar2(20)	Not null

Filtered_By_IP:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

Filtered_By_Size:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

Unfiltered:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

Filtered_By_From:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

Filtered_By_Self:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

Filtered_By_Subject:

Field name	Data type	Key Type
Username	Varchar2(20)	Foreign key(Login)
From_ID	Varchar2(40)	Not null
From_IP_Address	Varchar2(15)	Not null
Subject	Varchar2(20)	Not null
Mail_Date	Date	Not null
Mail_Size	Number	Not null
Mail_File	Varchar2(20)	Not null

UML Diagrams

Class Diagram

Use-case Diagram

Describes administrator features

Describes about Mail filtering

Describes about User operations.

Admin issues filtering

Admin initiates filtering.

Displaying filtered information

Sequence Diagram

New UBL IP addresses adding.

Creating new user

Deleting User

Displays list of users

Modifying User Password

Adding New Universal Virus Specification

Component Diagram

Deployment Diagram

Coding

Implementation

Technologies Used

HTML

HTML, an initialism of Hypertext Markup Language, is the predominant markup language for web pages. It provides a means to describe the structure of text-based information in a document — by denoting certain text as headings, paragraphs, lists, and so on — and to supplement that text with interactive forms, embedded images, and other objects. HTML is written in the form of labels (known as tags), surrounded by angle brackets. HTML can also describe, to some degree, the appearance and semantics of a document, and can include embedded scripting language code which can affect the behavior of web browsers and other HTML processors.

HTML is also often used to refer to content of the MIME type text/html or even more broadly as a generic term for HTML whether in its XML-descended form (such as XHTML 1.0 and later) or its form descended directly from SGML

Hyper Text Markup Language

Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW), allows users to produces Web pages that include text, graphics and pointer to other Web pages (Hyperlinks).

HTML is not a programming language but it is an application of ISO Standard 8879, SGML (Standard Generalized Markup Language), but specialized to hypertext and adapted to the Web. The idea behind Hypertext is that instead of reading text in rigid linear structure, we can easily jump from one point to another point. We can navigate through the information based on our interest and preference. A markup language is simply a series of elements, each delimited with special characters that define how text or other items enclosed within the elements should be displayed. Hyperlinks are underlined or emphasized works that load to other documents or some portions of the same document.

HTML can be used to display any type of document on the host computer, which can be geographically at a different location. It is a versatile language and can be used on any platform or desktop.

HTML provides tags (special codes) to make the document look attractive. HTML tags are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the presentation of the document. Anything that is not a tag is part of the document itself.

Basic HTML Tags:

<! -- --> specifies comments

<A>……….</A> Creates hypertext links

<B>……….</B> Formats text as bold

<BIG>……….</BIG> Formats text in large font.

<BODY>…</BODY> Contains all tags and text in the HTML document

<CENTER>...</CENTER> Creates text

<DD>…</DD> Definition of a term

<DL>...</DL> Creates definition list

<FONT>…</FONT> Formats text with a particular font

<FORM>...</FORM> Encloses a fill-out form

<FRAME>...</FRAME> Defines a particular frame in a set of frames

<H#>…</H#> Creates headings of different levels( 1 – 6 )

<HEAD>...</HEAD> Contains tags that specify information about a document

<HR>...</HR> Creates a horizontal rule

<HTML>…</HTML> Contains all other HTML tags

<META>...</META> Provides meta-information about a document

<SCRIPT>…</SCRIPT> Contains client-side or server-side script

<TABLE>…</TABLE> Creates a table

<TD>…</TD> Indicates table data in a table

<TR>…</TR> Designates a table row

<TH>…</TH> Creates a heading in a table

Attributes

The attributes of an element are name-value pairs, separated by "=", and written within the start label of an element, after the element's name. The value should be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML).Leaving attribute values unquoted is considered unsafe.

Most elements take any of several common attributes: id, class, style and title. Most also take language-related attributes: lang and dir.

The id attribute provides a document-wide unique identifier for an element. This can be used by stylesheets to provide presentational properties, by browsers to focus attention on the specific element or by scripts to alter the contents or presentation of an element. The class attribute provides a way of classifying similar elements for presentation purposes. For example, an HTML document (or a set of documents) may use the designation class="notation" to indicate that all elements with this class value are all subordinate to the main text of the document (or documents). Such notation classes of elements might be gathered together and presented as footnotes on a page, rather than appearing in the place where they appear in the source HTML.

An author may use the style non-attributal codes presentational properties to a particular element. It is considered better practice to use an element’s son- id page and select the element with a stylesheet, though sometimes this can be too cumbersome for a simple ad hoc application of styled properties. The title is used to attach subtextual explanation to an element. In most browsers this title attribute is displayed as what is often referred to as a tooltip. The generic inline span element can be used to demonstrate these various non-attributes.

The preceding displays as HTML (pointing the cursor at the abbreviation should display the title text in most browsers).

Advantages

Ø A HTML document is small and hence easy to send over the net. It is small because it does not include formatted information.

Ø HTML is platform independent.

Ø HTML tags are not case-sensitive.

JavaScript

JavaScript is a script-based programming language that was developed by Netscape Communication Corporation. JavaScript was originally called Live Script and renamed as JavaScript to indicate its relationship with Java. JavaScript supports the development of both client and server components of Web-based applications. On the client side, it can be used to write programs that are executed by a Web browser within the context of a Web page. On the server side, it can be used to write Web server programs that can process information submitted by a Web browser and then update the browser’s display accordingly

Even though JavaScript supports both client and server Web programming, we prefer JavaScript at Client side programming since most of the browsers supports it. JavaScript is almost as easy to learn as HTML, and JavaScript statements can be included in HTML documents by enclosing the statements between a pair of scripting tags

<SCRIPTS>.. </SCRIPT>.

JavaScript statements

</SCRIPT>

Here are a few things we can do with JavaScript:

Ø Validate the contents of a form and make calculations.

Ø Add scrolling or changing messages to the Browser’s status line.

Ø Animate images or rotate images that change when we move the mouse over them.

Ø Detect the browser in use and display different content for different browsers.

Ø Detect installed plug-ins and notify the user if a plug-in is required.

We can do much more with JavaScript, including creating entire application.

JavaScript Vs Java

JavaScript and Java are entirely different languages. A few of the most glaring differences are:

Java applets are generally displayed in a box within the web document; JavaScript can affect any part of the Web document itself.
While JavaScript is best suited to simple applications and adding interactive features to Web pages; Java can be used for incredibly complex applications.

There are many other differences but the important thing to remember is that JavaScript and Java are separate languages. They are both useful for different things; in fact they can be used together to combine their advantages.

Advantages

Ø JavaScript can be used for Sever-side and Client-side scripting.

It is more flexible than VBScript.

Ø JavaScript is the default scripting languages at Client-side since all the browsers supports it.

Java Technology

Initially the language was called as “oak” but it was renamed as “Java” in 1995. The primary motivation of this language was the need for a platform-independent (i.e., architecture neutral) language that could be used to create software to be embedded in various consumer electronic devices.

· Java is a programmer’s language.

· Java is cohesive and consistent.

· Except for those constraints imposed by the Internet environment, Java gives the programmer, full control.

· Finally, Java is to Internet programming where C was to system programming.

Importance of Java to the Internet

Java has had a profound effect on the Internet. This is because; Java expands the Universe of objects that can move about freely in Cyberspace. In a network, two categories of objects are transmitted between the Server and the Personal computer. They are: Passive information and Dynamic active programs. The Dynamic, Self-executing programs cause serious problems in the areas of Security and probability. But, Java addresses those concerns and by doing so, has opened the door to an exciting new form of program called the Applet.

Java can be used to create two types of programs

Applications and Applets: An application is a program that runs on our Computer under the operating system of that computer. It is more or less like one creating using C or C++. Java’s ability to create Applets makes it important. An Applet is an application designed to be transmitted over the Internet and executed by a Java –compatible web browser. An applet is actually a tiny Java program, dynamically downloaded across the network, just like an image. But the difference is, it is an intelligent program, not just a media file. It can react to the user input and dynamically change.

Features of Java Security

Every time you that you download a “normal” program, you are risking a viral infection. Prior to Java, most users did not download executable programs frequently, and those who did scan them for viruses prior to execution. Most users still worried about the possibility of infecting their systems with a virus. In addition, another type of malicious program exists that must be guarded against. This type of program can gather private information, such as credit card numbers, bank account balances, and passwords. Java answers both these concerns by providing a “firewall” between a network application and your computer.

When you use a Java-compatible Web browser, you can safely download Java applets without fear of virus infection or malicious intent.

Portability

For programs to be dynamically downloaded to all the various types of platforms connected to the Internet, some means of generating portable executable code is needed .As you will see, the same mechanism that helps ensure security also helps create portability. Indeed, Java’s solution to these two problems is both elegant and efficient.

The Byte code

The key that allows the Java to solve the security and portability problems is that the output of Java compiler is Byte code. Byte code is a highly optimized set of instructions designed to be executed by the Java run-time system, which is called the Java Virtual Machine (JVM). That is, in its standard form, the JVM is an interpreter for byte code.

Translating a Java program into byte code helps makes it much easier to run a program in a wide variety of environments. The reason is, once the run-time package exists for a given system, any Java program can run on it.

Although Java was designed for interpretation, there is technically nothing about Java that prevents on-the-fly compilation of byte code into native code. Sun has just completed its Just In Time (JIT) compiler for byte code. When the JIT compiler is a part of JVM, it compiles byte code into executable code in real time, on a piece-by-piece, demand basis. It is not possible to compile an entire Java program into executable code all at once, because Java performs various run-time checks that can be done only at run time. The JIT compiles code, as it is needed, during execution.

Java Virtual Machine (JVM)

Beyond the language, there is the Java virtual machine. The Java virtual machine is an important element of the Java technology. The virtual machine can be embedded within a web browser or an operating system. Once a piece of Java code is loaded onto a machine, it is verified. As part of the loading process, a class loader is invoked and does byte code verification makes sure that the code that’s has been generated by the compiler will not corrupt the machine that it’s loaded on. Byte code verification takes place at the end of the compilation process to make sure that is all accurate and correct. So byte code verification is integral to the compiling and executing of Java code.

Overall Description

Picture showing the development process of JAVA Program

Java programming uses to produce byte codes and executes them. The first box indicates that the Java source code is located in a. Java file that is processed with a Java compiler called javac. The Java compiler produces a file called a. class file, which contains the byte code. The .Class file is then loaded across the network or loaded locally on your machine into the execution environment is the Java virtual machine, which interprets and executes the byte code.

Java Architecture

Java architecture provides a portable, robust, high performing environment for development. Java provides portability by compiling the byte codes for the Java Virtual Machine, which is then interpreted on each platform by the run-time environment. Java is a dynamic system, able to load code when needed from a machine in the same room or across the planet.

Compilation of code

When you compile the code, the Java compiler creates machine code (called byte code) for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed to execute the byte code. The JVM is created for overcoming the issue of portability. The code is written and compiled for one machine and interpreted on all machines. This machine is called Java Virtual Machine.

Compiling and interpreting Java Source Code

During run-time the Java interpreter tricks the byte code file into thinking that it is running on a Java Virtual Machine. In reality this could be a Intel Pentium Windows 95 or SunSARC station running Solaris or Apple Macintosh running system and all could receive code from any computer through Internet and run the Applets.

Simple

Java was designed to be easy for the Professional programmer to learn and to use effectively. If you are an experienced C++ programmer, learning Java will be even easier. Because Java inherits the C/C++ syntax and many of the object oriented features of C++. Most of the confusing concepts from C++ are either left out of Java or implemented in a cleaner, more approachable manner. In Java there are a small number of clearly defined ways to accomplish a given task.

Object-Oriented

Java was not designed to be source-code compatible with any other language. This allowed the Java team the freedom to design with a blank slate. One outcome of this was a clean usable, pragmatic approach to objects. The object model in Java is simple and easy to extend, while simple types, such as integers, are kept as high-performance non-objects.

Robust

The multi-platform environment of the Web places extraordinary demands on a program, because the program must execute reliably in a variety of systems. The ability to create robust programs was given a high priority in the design of Java. Java is strictly typed language; it checks your code at compile time and run time.

Java virtually eliminates the problems of memory management and de-allocation, which is completely automatic. In a well-written Java program, all run time errors can –and should –be managed by your program.

Java Database Connectivity

What Is JDBC?

JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a trademarked name and is not an acronym; nevertheless, JDBC is often thought of as standing for Java Database Connectivity. It consists of a set of classes and interfaces written in the Java programming language. JDBC provides a standard API for tool/database developers and makes it possible to write database applications using a pure Java API.

Using JDBC, it is easy to send SQL statements to virtually any relational database. One can write a single program using the JDBC API, and the program will be able to send SQL statements to the appropriate database. The combinations of Java and JDBC lets a programmer write it once and run it anywhere.

What Does JDBC Do?

Simply put, JDBC makes it possible to do three things:

Ø Establish a connection with a database

Ø Send SQL statements

Ø Process the results.

JDBC versus ODBC and other APIs

At this point, Microsoft's ODBC (Open Database Connectivity) API is that probably the most widely used programming interface for accessing relational databases. It offers the ability to connect to almost all databases on almost all platforms.

So why not just use ODBC from Java? The answer is that you can use ODBC from Java, but this is best done with the help of JDBC in the form of the JDBC-ODBC Bridge, which we will cover shortly. The question now becomes "Why do you need JDBC?" There are several answers to this question:

ODBC is not appropriate for direct use from Java because it uses a C interface. Calls from Java to native C code have a number of drawbacks in the security, implementation, robustness, and automatic portability of applications.
A literal translation of the ODBC C API into a Java API would not be desirable. For example, Java has no pointers, and ODBC makes copious use of them, including the notoriously error-prone generic pointer "void *". You can think of JDBC as ODBC translated into an object-oriented interface that is natural for Java programmers.
ODBC is hard to learn. It mixes simple and advanced features together, and it has complex options even for simple queries. JDBC, on the other hand, was designed to keep simple things simple while allowing more advanced capabilities where required.
A Java API like JDBC is needed in order to enable a "pure Java" solution. When ODBC is used, the ODBC driver manager and drivers must be manually installed on every client machine. When the JDBC driver is written completely in Java, however, JDBC code is automatically installable, portable, and secure on all Java platforms from network computers to mainframes.

Two-tier and Three-tier Models

The JDBC API supports both two-tier and three-tier models for database access.

In the two-tier model, a Java applet or application talks directly to the database. This requires a JDBC driver that can communicate with the particular database management system being accessed. A user's SQL statements are delivered to the database, and the results of those statements are sent back to the user. The database may be located on another machine to which the user is connected via a network. This is referred to as a client/server configuration, with the user's machine as the client, and the machine housing the database as the server. The network can be an Intranet, which, for example, connects employees within a corporation, or it can be the Internet.

In the three-tier model, commands are sent to a "middle tier" of services, which then send SQL statements to the database. The database processes the SQL statements and sends the results back to the middle tier, which then sends them to the user. MIS directors find the three-tier model very attractive because the middle tier makes it possible to maintain control over access and the kinds of updates that can be made to corporate data. Another advantage is that when there is a middle tier, the user can employ an easy-to-use higher-level API which is translated by the middle tier into the appropriate low-level calls. Finally, in many cases the three-tier architecture can provide performance advantages.

Until now the middle tier has typically been written in languages such as C or C++, which offer fast performance. However, with the introduction of optimizing compilers that translate Java byte code into efficient machine-specific code, it is becoming practical to implement the middle tier in Java. This is a big plus, making it possible to take advantage of Java's robustness, multithreading, and security features. JDBC is important to allow database access from a Java middle tier.

JDBC Driver Types

The JDBC drivers that we are aware of at this time fit into one of four categories:

Ø JDBC-ODBC bridge plus ODBC driver

Ø Native-API partly-Java driver

Ø JDBC-Net pure Java driver

Ø Native-protocol pure Java driver

JDBC-ODBC Bridge

If possible, use a Pure Java JDBC driver instead of the Bridge and an ODBC driver. This completely eliminates the client configuration required by ODBC. It also eliminates the potential that the Java VM could be corrupted by an error in the native code brought in by the Bridge (that is, the Bridge native library, the ODBC driver manager library, the ODBC driver library, and the database client library).

What Is the JDBC- ODBC Bridge?

The JDBC-ODBC Bridge is a JDBC driver, which implements JDBC operations by translating them into ODBC operations. To ODBC it appears as a normal application program. The Bridge implements JDBC for any database for which an ODBC driver is available. The Bridge is implemented as the

Sun.jdbc.odbc Java package and contains a native library used to access ODBC. The Bridge is a joint development of Innersole and Java Soft.

JDBC connectivity

The JDBC provides database-independent connectivity between the J2EE platform and a wide range of tabular data sources. JDBC technology allows an Application Component Provider to:

Ø Perform connection and authentication to a database server

Ø Manager transactions

Ø Move SQL statements to a database engine for preprocessing and execution

Ø Execute stored procedures

Ø Inspect and modify the results from Select statements

Database:

A database management system (DBMS) is computer software designed for the purpose of managing databases, a large set of structured data, and run operations on the data requested by numerous users. Typical examples of DBMSs include Oracle, DB2, Microsoft Access, Microsoft SQL Server, Firebird, PostgreSQL, MySQL, SQLite, FileMaker and Sybase Adaptive Server Enterprise. DBMSs are typically used by Database administrators in the creation of Database systems. Typical examples of DBMS use include accounting, human resources and customer support systems.

Originally found only in large companies with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.

Description

A DBMS is a complex set of software programs that controls the organization, storage, management, and retrieval of data in a database. A DBMS includes:

ü A modeling language to define the schema of each database hosted in the DBMS, according to the DBMS data model.

· The four most common types of organizations are the hierarchical, network, relational and object models. Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).

· The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.

ü Data structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device (which implies relatively slow access compared to volatile main memory).

ü A database query language and report writer to allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.

· It also controls the security of the database.

· Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.

· If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function.

ü A transaction mechanism, that ideally would guarantee the ACID properties, in order to ensure data integrity, despite concurrent user accesses (concurrency control), and faults (fault tolerance).

· It also maintains the integrity of the data in the database.

· The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information (Redundancy avoidance).

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.

Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems design decisions are performed by data administrators and systems analysts. Detailed database design is performed by database administrators.

Database servers are specially designed computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with RAID disk arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware database accelerators are also used in large volume transaction processing environments.

DBMSs are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.
SQL

Structured Query Language (SQL) is the language used to manipulate relational databases. SQL is tied very closely with the relational model.

In the relational model, data is stored in structures called relations or tables.

SQL statements are issued for the purpose of:

Data definition: Defining tables and structures in the database (DDL used to create, alter and drop schema objects such as tables and indexes).

Data manipulation: Used to manipulate the data within those schema objects (DML Inserting, Updating, Deleting the data, and Querying the Database).

A schema is a collection of database objects that can include: tables, views, indexes and sequences

List of SQL statements that can be issued against an Oracle database schema are:

· ALTER - Change an existing table, view or index definition (DDL)

· AUDIT - Track the changes made to a table (DDL)

· COMMENT - Add a comment to a table or column in a table (DDL)

· COMMIT - Make all recent changes permanent (DML - transactional)

· CREATE - Create new database objects such as tables or views (DDL)

· DELETE - Delete rows from a database table (DML)

· DROP - Drop a database object such as a table, view or index (DDL)

· GRANT - Allow another user to access database objects such as tables or views (DDL)

· INSERT - Insert new data into a database table (DML)

· No AUDIT - Turn off the auditing function (DDL)

· REVOKE - Disallow a user access to database objects such as tables and views (DDL)

· ROLLBACK - Undo any recent changes to the database (DML - Transactional)

· SELECT - Retrieve data from a database table (DML)

· TRUNCATE - Delete all rows from a database table (can not be rolled back) (DML)

· UPDATE - Change the values of some data items in a database table (DML)

SERVLETS

Introduction

The Java web server is JavaSoft's own web Server. The Java web server is just a part of a larger framework, intended to provide you not just with a web server, but also with tools. To build customized network servers for any Internet or Intranet client/server system. Servlets are to a web server, how applets are to the browser.

About Servlets

Servlets provide a Java-based solution used to address the problems currently associated with doing server-side programming, including inextensible scripting solutions, platform-specific APIs, and incomplete interfaces.

Servlets are objects that conform to a specific interface that can be plugged into a Java-based server. Servlets are to the server-side what applets are to the client-side - object byte codes that can be dynamically loaded off the net. They differ from applets in that they are faceless objects (without graphics or a GUI component). They serve as platform independent, dynamically loadable, pluggable helper byte code objects on the server side that can be used to dynamically extend server-side functionality.

For example, an HTTP Servlets can be used to generate dynamic HTML content. When you use Servlets to do dynamic content you get the following advantages:

Ø They’re faster and cleaner than CGI scripts

Ø They use a standard API (the Servlets API)

Ø They provide all the advantages of Java (run on a variety of servers without needing to be rewritten).

Attractiveness of Servlets

There are many features of Servlets that make them easy and attractive to use. These include:

Ø Easily configured using the GUI-based Admin tool

Ø Can be loaded and invoked from a local disk or remotely across the network.

Ø Can be linked together, or chained, so that one Servlets can call another Servlets, or several Servlets in sequence.

Ø Can be called dynamically from within HTML pages, using server-side include tags.

Ø Are secure - even when downloading across the network, the Servlets security model and Servlets sandbox protect your system from unfriendly behavior.

Advantages of the Servlet API

One of the great advantages of the Servlet API is protocol independence. It assumes nothing about:

· The protocol being used to transmit on the net

· How it is loaded

· The server environment it will be running in

These qualities are important, because it allows the Servlet API to be embedded in many different kinds of servers. There are other advantages to the Servlet API as well. These include:

· It’s extensible - you can inherit all your functionality from the base classes made available to you.

· It’s simple, small, and easy to use.

Features of Servlets:

· Servlets are persistent. Servlet are loaded only by the web server and can maintain services between requests.

· Servlets are fast. Since Servlets only need to be loaded once, they offer much better performance over their CGI counterparts.

· Servlets are platform independent.

· Servlets are extensible. Java is a robust, object-oriented programming language, which easily can be extended to suit your needs

· Servlets are secure.

· Servlets can be used with a variety of clients.

Loading Servlets:

Servlets can be loaded from three places

From a directory that is on the CLASSPATH. The CLASSPATH of the JavaWebServer includes service root/classes/ which is where the system classes reside.

From the <SERVICE_ROOT /Servlets/ directory. This is *not* in the server’s class path. A class loader is used to create Servlets from this directory. New Servlets can be added - existing Servlets can be recompiled and the server will notice these changes.

From a remote location, for this a code base like http: // nine.eng / classes / foo / is required in addition to the Servlets class name. Refer to the admin GUI docs on Servlet section to see how to set this up.

Loading Remote Servlets

Remote Servlets can be loaded by:

Configuring the Admin Tool to setup automatic loading of remote Servlets
Setting up server side include tags in. html files
Defining a filter chain configuration

Invoking Servlets

A Servlet invoker is a Servlet that invokes the "service" method on a named Servlet. If the Servlet is not loaded in the server, then the invoker first loads the Servlet (either from local disk or from the network) and the then invokes the "service" method. Also like applets, local Servlets in the server can be identified by just the class name. In other words, if a Servlet name is not absolute, it is treated as local.

A client can invoke Servlets in the following ways:

The client can ask for a document that is served by the Servlet.
The client (browser) can invoke the Servlet directly using a URL, once it has been mapped using the Servlet Aliases section of the admin GUI.
The Servlet can be invoked through server side include tags.
The Servlet can be invoked by placing it in the Servlets/ directory.
The Servlet can be invoked by using it in a filter chain.

Java Server Pages (JSP)

Java server Pages is a simple, yet powerful technology for creating and maintaining dynamic-content web pages. Based on the Java programming language, Java Server Pages offers proven portability, open standards, and a mature re-usable component model .The Java Server Pages architecture enables the separation of content generation from content presentation. This separation not eases maintenance headaches; it also allows web team members to focus on their areas of expertise. Now, web page designer can concentrate on layout, and web application designers on programming, with minimal concern about impacting each other’s work.

Features of JSP

Portability:

Java Server Pages files can be run on any web server or web-enabled application server that provides support for them. Dubbed the JSP engine, this support involves recognition, translation, and management of the Java Server Page lifecycle and its interaction components.

Components

It was mentioned earlier that the Java Server Pages architecture can include reusable Java components. The architecture also allows for the embedding of a scripting language directly into the Java Server Pages file. The components current supported include Java Beans, and Servlets.

Processing

A Java Server Pages file is essentially an HTML document with JSP scripting or tags. The Java Server Pages file has a JSP extension to the server as a Java Server Pages file. Before the page is served, the Java Server Pages syntax is parsed and processed into a Servlet on the server side. The Servlet that is generated outputs real content in straight HTML for responding to the client.

Access Models:

A Java Server Pages file may be accessed in at least two different ways. A client’s request comes directly into a Java Server Page. In this scenario, suppose the page accesses reusable Java Bean components that perform particular well-defined computations like accessing a database. The result of the Beans computations, called result sets is stored within the Bean as properties. The page uses such Beans to generate dynamic content and present it back to the client.

In both of the above cases, the page could also contain any valid Java code. Java Server Pages architecture encourages separation of content from presentation.

Steps in the execution of a JSP Application:

1. The client sends a request to the web server for a JSP file by giving the name of the JSP file within the form tag of a HTML page.

This request is transferred to the JavaWebServer. At the server side JavaWebServer receives the request and if it is a request for a jsp file server gives this request to the JSP engine.

3. JSP engine is program which can under stands the tags of the jsp and then it converts those tags into a Servlet program and it is stored at the server side. This Servlet is loaded in the memory and then it is executed and the result is given back to the JavaWebServer and then it is transferred back to the result is given back to the JavaWebServer and then it is transferred back to the client.

Eclipse IDE

Eclipse is an open-source software framework written primarily in Java. In its default form it is an Integrated Development Environment (IDE) for Java developers, consisting of the Java Development Tools (JDT) and the Eclipse Compiler for Java (ECJ). Users can extend its capabilities by installing plug-ins written for the Eclipse software framework, such as development toolkits for other programming languages, and can write and contribute their own plug-in modules. Language packs are available for over a dozen languages.

Architecture

The basis for Eclipse is the Rich Client Platform (RCP). The following components constitute the rich client platform:

ü OSGi - a standard bundling framework

ü Core platform - boot Eclipse, run plug-ins

ü the Standard Widget Toolkit (SWT) - a portable widget toolkit

ü JFace - viewer classes to bring model view controller programming to SWT, file buffers, text handling, text editors

ü the Eclipse Workbench - views, editors, perspectives, wizards

Eclipse's widgets are implemented by a widget toolkit for Java called SWT, unlike most Java applications, which use the Java standard Abstract Window Toolkit (AWT) or Swing. Eclipse's user interface also leverages an intermediate GUI layer called JFace, which simplifies the construction of applications based on SWT.

Eclipse employs plug-ins in order to provide all of its functionality on top of (and including) the rich client platform, in contrast to some other applications where functionality is typically hard coded. This plug-in mechanism is a lightweight software componentry framework. In addition to allowing Eclipse to be extended using other programming languages such as C and Python, the plug-in framework allows Eclipse to work with typesetting languages like LaTeX, networking applications such as telnet, and database management systems. The plug-in architecture supports writing any desired extension to the environment, such as for configuration management. Java and CVS support is provided in the Eclipse SDK.

The key to the seamless integration of tools with Eclipse is the plugin. With the exception of a small run-time kernel, everything in Eclipse is a plug-in. This means that a plug-in you develop integrates with Eclipse in exactly the same way as other plug-ins; in this respect, all features are created equal.

The Eclipse SDK includes the Eclipse Java Development Tools, offering an IDE with a built-in incremental Java compiler and a full model of the Java source files. This allows for advanced refactoring techniques and code analysis. The IDE also makes use of a workspace, in this case a set of metadata over a flat filespace allowing external file modifications as long as the corresponding workspace "resource" is refreshed afterwards. The Visual Editor project allows interfaces to be created interactively, hence allowing Eclipse to be used as a RAD tool.

The following is a list of notable projects and plugins for the Eclipse IDE.

These projects are maintained by the Eclipse community and hosted by the Eclipse Foundation.

1. Core projects

Rich Client Platform (Platform) is the core framework that all other Eclipse projects are built on.

Java Development Tools (JDT) provides support for core Java SE. This includes a standalone fast incremental compiler.

Tools projects

C/C++ Development Tools (CDT) adds support for C/C++ syntax highlighting, code formatting, debugger integration and project structures. Unlike the JDT project, the CDT project does not add a compiler and relies on an external tool chain.

Graphical Editing Framework (GEF) allows developers to build standalone graphical tools. Example use include circuit diagram design tools, activity diagram editors and WYSIWYG document editors.

Web projects

J2EE Standard Tools (JST) extends the core JDT to include support for Java EE projects. This includes EJBs, JSPs and Servlets.

PHP Development Tools (PDT)

Web Standard Tools (WST) adds standards compliant web development tools. These tools include editors for XML, HTML and CSS.

Modelling projects

Eclipse Modeling Framework (EMF) a modeling framework and code generation facility for building tools and other applications based on a structured data model, from a model specification described in XMI.

Graphical Modeling Framework (GMF) is a generative component and runtime infrastructure for developing graphical editors based on EMF and GEF.

Other projects

Test and Performance Tools Platform (TPTP) which provides a platform that allows software developers to build test and performance tools, such as debuggers, profilers and benchmarking applications.

Business Intelligence and Reporting Tools Project (BIRT), an Eclipse-based open source reporting system for web applications, especially those based on Java EE.

Applications Server

An application server is a software engine that delivers applications to client computers or devices, typically through the Internet and using the Hypertext Transfer Protocol. Application servers are distinguished from web servers by the extensive use of server-side dynamic content and frequent integration with database engines.

Common features

Application server products typically bundle middleware to enable applications to intercommunicate with dependent applications, like web servers, database management systems, and chart programs. Some application servers also provide an API, making them operating system independent. Portals are a common application server mechanism by which a single point of entry is provided to multiple devices.

Java application servers

Java EE Servers

Following the success of the Java platform, the term application server sometimes refers to a Java Platform--Enterprise Edition (J2EE) or Java EE 5 application server. Among the better known Java Enterprise Edition application servers are WebLogic Server (BEA), JBoss (Red Hat), WebSphere (IBM), JRun (Adobe), Apache Geronimo (Apache Foundation, based on IBM WebSphere), Oracle OC4J (Oracle Corporation), Sun Java System Application Server (Sun Microsystems) and Glassfish Application Server (based on Sun Java System Application Server).

JOnAS application server was the first open source application server to have achieved official compliance with the Java Enterprise Specification. BEA delivered the first Java EE 5 certified application server followed by Sun Microsystems' reference implementation GlassFish.

The Web modules are servlets and JavaServer Pages, and business logic is built into Enterprise JavaBeans (EJB-3 and later). The Hibernate project offers an EJB-3 container implementation for the JBoss Application server. Tomcat from Apache and JOnAS from ObjectWeb are typical of containers into which these modules can be put.

A Java Server Page (JSP) is a servlet from Java that executes in a Web container—the Java equivalent of CGI scripts. JSPs are a way to create HTML pages by embedding references to the server logic within the page. HTML coders and Java programmers can work side by side by referencing each other's code from within their own. JavaBeans are the independent class components of the Java architecture from Sun Microsystems.

The application servers mentioned above mainly serve Web applications. Some application servers target networks other than the Web: Session Initiation Protocol servers, for instance, target telephony networks.

JBOSS

JBoss Application Server (or JBoss AS) is a free software / open source Java EE-based application server. Because it is Java-based, JBoss AS is cross-platform, usable on any operating system that Java supports.

Environment

JBoss AS 4.0 is a J2EE 1.4 application server, with embedded Tomcat 5.5. Any JVM between 1.4 and 1.5 is supported. JBoss can run on numerous operating systems including Windows, Mac OS X, many POSIX platforms, and others, as long as a suitable JVM is present.

JBoss AS 4.2 is also a J2EE 1.4 application server, but EJB 3 is deployed by default. It requires JDK 5. Tomcat 6 is bundled with it.

Next JBoss AS 5 will be Java EE 5 application server.

Product features

ü Clustering

ü Failover (including sessions)

ü Load balancing

ü Distributed caching (using JBoss Cache, a standalone product)

ü Distributed deployment (farming)

ü Enterprise JavaBeans version 3

Snapshots

Testing

Software Testing is the process used to help identify the correctness, completeness, security, and quality of developed computer software. Testing is a process of technical investigation, performed on behalf of stakeholders, that is intended to reveal quality-related information about the product with respect to the context in which it is intended to operate. This includes, but is not limited to, the process of executing a program or application with the intent of finding errors. Quality is not an absolute; it is value to some person. With that in mind, testing can never completely establish the correctness of arbitrary computer software; testing furnishes a criticism or comparison that compares the state and behaviour of the product against a specification. An important point is that software testing should be distinguished from the separate discipline of Software Quality Assurance (SQA), which encompasses all business process areas, not just testing.

There are many approaches to software testing, but effective testing of complex products is essentially a process of investigation, not merely a matter of creating and following routine procedure. One definition of testing is "the process of questioning a product in order to evaluate it", where the "questions" are operations the tester attempts to execute with the product, and the product answers with its behavior in reaction to the probing of the tester[citation needed]. Although most of the intellectual processes of testing are nearly identical to that of review or inspection, the word testing is connoted to mean the dynamic analysis of the product—putting the product through its paces. Some of the common quality attributes include capability, reliability, efficiency, portability, maintainability, compatibility and usability. A good test is sometimes described as one which reveals an error; however, more recent thinking suggests that a good test is one which reveals information of interest to someone who matters within the project community.

Introduction

In general, software engineers distinguish software faults from software failures. In case of a failure, the software does not do what the user expects. A fault is a programming error that may or may not actually manifest as a failure. A fault can also be described as an error in the correctness of the semantic of a computer program. A fault will become a failure if the exact computation conditions are met, one of them being that the faulty portion of computer software executes on the CPU. A fault can also turn into a failure when the software is ported to a different hardware platform or a different compiler, or when the software gets extended. Software testing is the technical investigation of the product under test to provide stakeholders with quality related information.

Software testing may be viewed as a sub-field of Software Quality Assurance but typically exists independently (and there may be no SQA areas in some companies). In SQA, software process specialists and auditors take a broader view on software and its development. They examine and change the software engineering process itself to reduce the amount of faults that end up in the code or deliver faster.

Regardless of the methods used or level of formality involved the desired result of testing is a level of confidence in the software so that the organization is confident that the software has an acceptable defect rate. What constitutes an acceptable defect rate depends on the nature of the software. An arcade video game designed to simulate flying an airplane would presumably have a much higher tolerance for defects than software used to control an actual airliner.

A problem with software testing is that the number of defects in a software product can be very large, and the number of configurations of the product larger still. Bugs that occur infrequently are difficult to find in testing. A rule of thumb is that a system that is expected to function without faults for a certain length of time must have already been tested for at least that length of time. This has severe consequences for projects to write long-lived reliable software.

A common practice of software testing is that it is performed by an independent group of testers after the functionality is developed but before it is shipped to the customer. This practice often results in the testing phase being used as project buffer to compensate for project delays. Another practice is to start software testing at the same moment the project starts and it is a continuous process until the project finishes.

Another common practice is for test suites to be developed during technical support escalation procedures. Such tests are then maintained in regression testing suites to ensure that future updates to the software don't repeat any of the known mistakes.

It is commonly believed that the earlier a defect is found the cheaper it is to fix it.

	Time Detected
Time Introduced	Requirements	Architecture	Construction	System Test	Post-Release
Requirements	1	3	5-10	10	10-100
Architecture	-	1	10	15	25-100
Construction	-	-	1	10	10-25

In counterpoint, some emerging software disciplines such as extreme programming and the agile software development movement, adhere to a "test-driven software development" model. In this process unit tests are written first, by the programmers (often with pair programming in the extreme programming methodology). Of course these tests fail initially; as they are expected to. Then as code is written it passes incrementally larger portions of the test suites. The test suites are continuously updated as new failure conditions and corner cases are discovered, and they are integrated with any regression tests that are developed.

Unit tests are maintained along with the rest of the software source code and generally integrated into the build process (with inherently interactive tests being relegated to a partially manual build acceptance process).

The software, tools, samples of data input and output, and configurations are all referred to collectively as a test harness.

History

The separation of debugging from testing was initially introduced by Glenford J. Myers in his 1978 book the "Art of Software Testing". Although his attention was on breakage testing it illustrated the desire of the software engineering community to separate fundamental development activities, such as debugging, from that of verification. Drs. Dave Gelperin and William C. Hetzel classified in 1988 the phases and goals in software testing as follows: until 1956 it was the debugging oriented period, where testing was often associated to debugging: there was no clear difference between testing and debugging. From 1957-1978 there was the demonstration oriented period where debugging and testing was distinguished now - in this period it was shown, that software satisfies the requirements. The time between 1979-1982 is announced as the destruction oriented period, where the goal was to find errors. 1983-1987 is classified as the evaluation oriented period: intention here is that during the software lifecycle a product evaluation is provided and measuring quality. From 1988 on it was seen as prevention oriented period where tests were to demonstrate that software satisfies its specification, to detect faults and to prevent faults. Dr. Gelperin chaired the IEEE 829-1988 (Test Documentation Standard) with Dr. Hetzel writing the book "The Complete Guide of Software Testing". Both works were pivotal in to today's testing culture and remain a consistent source of reference. Dr. Gelperin and Jerry E. Durant also went on to develop High Impact Inspection Technology that builds upon traditional Inspections but utilizes a test driven additive.

White-box and black-box testing

To meet Wikipedia's quality standards, this section may require cleanup.
Please discuss this issue on the talk page, and/or replace this tag with a more specific message.

White box and black box testing are terms used to describe the point of view a test engineer takes when designing test cases. Black box being an external view of the test object and white box being an internal view. Software testing is partly intuitive, but largely systematic. Good testing involves much more than just running the program a few times to see whether it works. Thorough analysis of the program under test, backed by a broad knowledge of testing techniques and tools are prerequisites to systematic testing. Software Testing is the process of executing software in a controlled manner; in order to answer the question “Does this software behave as specified?” Software testing is used in association with Verification and Validation. Verification is the checking of or testing of items, including software, for conformance and consistency with an associated specification. Software testing is just one kind of verification, which also uses techniques as reviews, inspections, walk-through. Validation is the process of checking what has been specified is what the user actually wanted.

Validation: Are we doing the right job?
Verification: Are we doing the job right?

In order to achieve consistency in the Testing style, it is imperative to have and follow a set of testing principles. This enhances the efficiency of testing within SQA team members and thus contributes to increased productivity. The purpose of this document is to provide overview of the testing, plus the techniques.

At SDEI, 3 levels of software testing is done at various SDLC phases

Unit Testing: in which each unit (basic component) of the software is tested to verify that the detailed design for the unit has been correctly implemented
Integration testing: in which progressively larger groups of tested software components corresponding to elements of the architectural design are integrated and tested until the software works as a whole.
System testing: in which the software is integrated to the overall product and tested to show that all requirements are met

A further level of testing is also done, in accordance with requirements:

Acceptance testing: upon which the acceptance of the complete software is based. The clients often do this.
Regression testing: is used to refer the repetition of the earlier successful tests to ensure that changes made in the software have not introduced new bugs/side effects.

In recent years the term grey box testing has come into common usage. The typical grey box tester is permitted to set up or manipulate the testing environment, like seeding a database, and can view the state of the product after his actions, like performing a SQL query on the database to be certain of the values of columns. It is used almost exclusively of client-server testers or others who use a database as a repository of information, but can also apply to a tester who has to manipulate XML files (DTD or an actual XML file) or configuration files directly. It can also be used of testers who know the internal workings or algorithm of the software under test and can write tests specifically for the anticipated results. For example, testing a data warehouse implementation involves loading the target database with information, and verifying the correctness of data population and loading of data into the correct tables.

Test levels

Unit testing tests the minimal software component and sub-component or modules by the programmers.
Integration testing exposes defects in the interfaces and interaction between integrated components (modules).
Functional testing tests the product according to programmable work.
System testing tests an integrated system to verify/validate that it meets its requirements.
Acceptance testing testing can be conducted by the client. It allows the end-user or customer or client to decide whether or not to accept the product. Acceptance testing may be performed after the testing and before the implementation phase. See also Development stage

Alpha testing is simulated or actual operational testing by potential users/customers or an independent test team at the developers' site. Alpha testing is often employed for off-the-shelf software as a form of internal acceptance testing, before the software goes to beta testing.
Beta testing comes after alpha testing. Versions of the software, known as beta versions, are released to a limited audience outside of the company. The software is released to groups of people so that further testing can ensure the product has few faults or bugs. Sometimes, beta versions are made available to the open public to increase the feedback field to a maximal number of future users.

It should be noted that although both Alpha and Beta are referred to as testing it is in fact use emersion. The rigors that are applied are often unsystematic and many of the basic tenets of testing process are not used. The Alpha and Beta period provides insight into environmental and utilization conditions that can impact the software.

After modifying software, either for a change in functionality or to fix defects, a regression test re-runs previously passing tests on the modified software to ensure that the modifications haven't unintentionally caused a regression of previous functionality. Regression testing can be performed at any or all of the above test levels. These regression tests are often automated.

Test cases, suites, scripts and scenarios

A test case is a software testing document, which consists of event, action, input, output, expected result and actual result. Clinically defined (IEEE 829-1998) a test case is an input and an expected result. This can be as pragmatic as 'for condition x your derived result is y', whereas other test cases described in more detail the input scenario and what results might be expected. It can occasionally be a series of steps (but often steps are contained in a separate test procedure that can be exercised against multiple test cases, as a matter of economy) but with one expected result or expected outcome. The optional fields are a test case ID, test step or order of execution number, related requirement(s), depth, test category, author, and check boxes for whether the test is automatable and has been automated. Larger test cases may also contain prerequisite states or steps, and descriptions. A test case should also contain a place for the actual result. These steps can be stored in a word processor document, spreadsheet, database or other common repository. In a database system, you may also be able to see past test results and who generated the results and the system configuration used to generate those results. These past results would usually be stored in a separate table.

The term test script is the combination of a test case, test procedure and test data. Initially the term was derived from the byproduct of work created by automated regression test tools. Today, test scripts can be manual, automated or a combination of both.

The most common term for a collection of test cases is a test suite. The test suite often also contains more detailed instructions or goals for each collection of test cases. It definitely contains a section where the tester identifies the system configuration used during testing. A group of test cases may also contain prerequisite states or steps, and descriptions of the following tests.

Collections of test cases are sometimes incorrectly termed a test plan. They might correctly be called a test specification. If sequence is specified, it can be called a test script, scenario or procedure.

A sample testing cycle

Although testing varies between organizations, there is a cycle to testing:

Requirements Analysis: Testing should begin in the requirements phase of the software development life cycle.

During the design phase, testers work with developers in determining what aspects of a design are testable and under what parameter those tests work.

Test Planning: Test Strategy, Test Plan(s), Test Bed creation.
Test Development: Test Procedures, Test Scenarios, Test Cases, Test Scripts to use in testing software.
Test Execution: Testers execute the software based on the plans and tests and report any errors found to the development team.
Test Reporting: Once testing is completed, testers generate metrics and make final reports on their test effort and whether or not the software tested is ready for release.
Retesting the Defects

Not all errors or defects reported must be fixed by a software development team. Some may be caused by errors in configuring the test software to match the development or production environment. Some defects can be handled by a workaround in the production environment. Others might be deferred to future releases of the software, or the deficiency might be accepted by the business user. There are yet other defects that may be rejected by the development team (of course, with due reason) if they deem it inappropriate to be called a defect.

Limitations and Scope for Future Enhancement

Limitations of the system:

· Only the permanent employees can access the system.

· System works in all platforms and its compatible environments.

· Advanced techniques are not used to check the authorization.

Future Enhancements:

It is not possible to develop a system that makes all the requirements of the user. User requirements keep changing as the system is being used. Some of the future enhancements that can be done to this system are:

As the technology emerges, it is possible to upgrade the system and can be adaptable to desired environment.
Because it is based on object-oriented design, any further changes can be easily adaptable.
Based on the future security issues, security can be improved using emerging technologies.
Attendance module can be added
sub admin module can be added

Project Summary

The WEB MAIL FILTERS Filter for mails is a web-based application for primarily providing training to the employees who provide customized solutions to meet organizational needs.

This application software has been computed successfully and was also tested successfully by taking “test cases”. It is user friendly, and has required options, which can be utilized by the user to perform the desired operations.

The software is developed using Java as front end and Oracle as back end in Windows environment. The goals that are achieved by the software are:

ü Instant access.

ü Improved productivity.

ü Optimum utilization of resources.

ü Efficient management of records.

ü Simplification of the operations.

ü Less processing time and getting required information.

ü User friendly.

ü Portable and flexible for further enhancement.

References

Core Java™ 2 Volume I – Fundamentals 7th Edition	- Cay S. Hortsman
Pearson Education – Sun Microsystems	Gary Cornell

Core Java™ 2 Volume II – Advanced	- Cay S. Hortsman
Pearson Education – Sun Microsystems	Gary Cornell

Head First Servlets & JSP	- Eric Freeman
O’Reilly – SPD	Elisabeth Freeman

The Book of JavaScript 2nd Edition	- thau
SPD

Effective Java – Programming Language Guide	- Joshua Bloch
Pearson Education – Sun Microsystems

Java Database Best Practices	- George Reese
O’Reilly – SPD

JBoss – A Developers Notebook	- Norman Richards
O’Reilly – SPD	Sam Griffith

venky

Sunday, 24 July 2011

Tuesday, 12 July 2011