Aveneu Park, Starling, Australia

1.0 now working towards making this a possibility in

1.0  INTRODUCTION.

For so long, computer scientists have struggled
with the question, can computers truly learn to perform a task through examples
or previously solved tasks? Can computers improve themselves significantly on
the basis of past mistakes? So, to solve these questions “machine learning” research
began and it is now working towards making this a possibility in computers.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

In order for a computer or computer controlled
robot to perform a task, traditional programming demands that a programmer
writes a correct algorithm to perform such task and then implement said
algorithm in the computer using a programming language. Such process is usually
a tedious and time consuming one which is best done by trained personnel 7.
Machine learning also promises to reduce the stress of hand programming.

Thus, Machine learning according to Tom Mitchell
“is concerned with the question of how to construct computer programs that
automatically improve with experience” 3.

This paper therefore looks to understand briefly
what machine learning is and how it can improve software testing, particularly
the testing method known as “FUZZING”
which consists of repeatedly testing an application with modified, or fuzzed,
inputs with the goal of finding security vulnerabilities in input-parsing code.
5

 

2.0
RELATED LITERATURE/WORKS

§ 
Thomas
J. Cheatham wrote a paper on the use of Machine Learning techniques to identify
attributes that are important in predicting software testing costs and software
testing time in a particular company.

§ 
DU
ZHANG and JEFFREY J.P. TSAI worked on the possibility of applying machine
learning in software engineering, whereby in the paper they provided the
characteristics and applicability of some frequently utilized machine learning
algorithms. They also offer some guidelines on applying machine learning
methods to software engineering tasks.

§ 
Back
in 2017, William Blum, Rishabh Singh, and Mohit Rajpal all Microsoft
researchers began a research project looking at ways to improve fuzzing
techniques using machine learning and deep neural networks. They wanted to see
what a machine learning model could learn if we were to insert a deep neural
network into the feedback loop of a grey box fuzzer.

§ 
Patrice
Godefroi, Hila Peleg, and Rishabh Singh in their paper “Learn&Fuzz: Machine
Learning for Input Fuzzing” show how to automate the generation of an input
grammar suitable for input fuzzing using sample inputs and neural-network-based
statistical machine-learning techniques. They then present a detailed case
study with a complex input format, namely PDF, and a large complex
security-critical parser for this format, namely, the PDF parser embedded in
Microsoft’s new Edge browser. They also present a new algorithm for this
learn challenge which uses a learnt input probability distribution to
intelligently guide where to fuzz inputs.

3.0  SUMMARY OF FINDINGS FROM
LITERATURE

Tom Mitchell stated in his book
“MACHINE LEARNING” 3 that:

A
computer program is said to learn
from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience
E.

For example, a computer program that
learns to play chess might improve its
performance as measured by its
ability to win at the class
of tasks involving playing chess,
through experience obtained by
playing games against itself.
In general, a well-defined learning problem, involves these three features: the
class of tasks, the measure of performance to be improved, and the source of
experience.

The emergence of Machine Learning was
as a result of two significant discoveries:

The first was the realization of Arthur Samuel in 1959 – that rather than teaching
computers everything they need to know about a task and how to carry it out, it
might be possible to teach them to learn for themselves. The second, was the emergence of the internet, and the
explosive increase in the amount of digital information made available for
analysis.

Short biography by “John McCarthy and
Ed Feigenbaum” 8

Arthur Samuel (1901-1990) was a pioneer
of artificial intelligence research. From 1949 through the late 1960s, he did
the best work in making computers learn from their experience. His vehicle for
this was the game of checkers, Samuel’s learning program used Lee’s Guide to
Checkers to adjust its criteria for choosing moves so that the program would
choose those thought good by checker experts as often as possible.

To better understand machine learning,
it would be good to consider its role within the following three niches in the
software world as stated by Tom Mitchell 2 as well as DU ZHANG and
JEFFREY J.P. TSAI 4:

a.    Data mining:
Domains where there are large databases containing valuable implicit
regularities to be discovered. 4

b.    Difficult-to-program applications: Poorly
understood problem domains where little knowledge exists for humans to develop
effective algorithms. 4

c.     Customized software applications: Domains
where programs must adapt to changing conditions. 4

3.1
Artificial Intelligence, Machine Learning and Deep Learning;

Artificial Intelligence, Machine
Learning and Deep Learning, three terms often used interchangeably making the
differences between this three somewhat unclear. The simplest way to actually
understand their relationship is by imagining three concentric circles with AI
coming first which deals with machines that can perform tasks that are
characteristic of human intelligence like, understanding language, recognizing
objects and sounds then machine learning — a subset of AI, and finally deep
learning — which is an approach in machine learning —  fitting inside both. 10

 

3.2
Real Life Applications;

Some real-life examples of the use of
machine learning 3:

     i.       
Learning to recognize spoken words: The
SPHINX system (e.g., Lee 1989) learns speaker-specific strategies for
recognizing the primitive sounds (phonemes) and words from the observed speech
signal.

    ii.       
Learning to drive an autonomous
vehicle: The ALVINN system (Pomerleau 1989) has used its learned strategies to
drive unassisted at 70 miles per hour for 90 miles on public highways among
other cars.

  iii.       
Learning to classify new astronomical
structures: The decision tree learning algorithms have been used by NASA to learn
how to classify celestial objects from the second Palomar Observatory Sky
Survey (Fayyad et al. 1995).

   iv.       
Learning to play world-class
backgammon: The world’s top computer program for backgammon, TD-GAMMON (Tesauro
1992, 1995). learned its strategy by playing over one million practices games
against itself. It now plays at a level competitive with the human world
champion.

    v.       
And in testing Microsoft have released
a tool, called Microsoft
Security Risk Detection, which makes uses of fuzz testing, or
fuzzing and significantly simplifies security testing and does not require you
to be an expert in security in order to root out software bugs 9.

3.3
Classification of machine learning systems:

Machine
Learning systems may be classified as stated by Jaime G. Carbonell, Ryszard S. Michalski and Tom M. Mitchell, in
terms of: (a.) The Underlying Learning Strategy: Here the learning strategies
are distinguished depending on the amount of inference the learner performs on
the information provided.   (b.) The Representation of knowledge or skill
acquired by the learner: whereby a learner could acquire knowledge such as
descriptions of physical objects, rules of behavior and so on. (c.) The
application domain of the performance system for which knowledge is acquired:
this depends on the area of application such as natural language processing,
robotics, image recognition etc. 7

3.4 Fuzzing it with Machine
Learning;

Software testing has always been a
tedious yet important part of the software development cycle, and fuzz testing
is one of the mostly used automated software testing technique. Fuzzing is done
by presenting a target program with crafted malicious input designed to discover unexpected behaviors
such as crashes, buffer overflows, memory errors, and exceptions.

The
fuzzing techniques can be categorized into three main categories by William
Blum 6: i) Blackbox fuzzing: which rely solely on the sample input files to
generate new inputs. ii) Whitebox fuzzing: which analyze the target program
either statically or dynamically to guide the search for new inputs aimed at
exploring as many code paths as possible. and iii) Greybox fuzzing: which make
use of a feedback loop to guide their search based on observed behavior from
previous executions of the program. 6

Neural networks can then be made to
learn patterns in the input files from previous fuzzing explorations to guide
the future fuzzing explorations.

By using a greybox fuzzer called
American fuzzy lop, and inserting a deep neural network into the feedback loop
of the AFL the Microsoft
researchers back in 2017 yielded encouraging results which shows that machine
learning can truly improve fuzzing, whereby the neural fuzzing method yields a
list of ways to perform greybox fuzzing that is (a.) Simple: The system learns
a strategy from an existing fuzzer. (b.) Efficient: From the AFL experiment, in
the first 24 hours they explored significantly more unique code paths than
traditional AFL. (c.) Generic: Although tested only on AFL, the approach could
be applied to any fuzzer, including blackbox and random fuzzers. 9

4.0
FUTURE RESEARCH/DEVELOPMENT PROPOSITIONS

The Neural fuzzing research project
done by Microsoft is just scratching the surface of what can be achieved using
deep neural networks for fuzzing. For now, the model only learns fuzzing
locations, but it could also be used to learn other fuzzing parameters such as
the type of mutation or strategy to apply.

The possibility of developing computer
programs that are capable of improving with experience can lead to the creation
of computer software’s developed with greater ease yet able to optimize itself
over time.

5.0
CONCLUDING REMARKS

The
emergence of the internet and explosion in available data that followed has
greatly helped in the development of machine learning  and with new data being generated daily
machine learning still has a long way to go in its development and  as such it can better be incorporated into
the field of software development, seeing that machine learning is a subset of
Artificial Intelligence, machine learning’s growth will soon be involved in solving
the problem AI aims at truly making computer programs that are considered to be
smart being able to perform tasks that are characteristic of human
intelligence.

The
creation of the tool “Microsoft Security Risk Detection” also shows promise in
the use of machine learning for further means of software testing.

x

Hi!
I'm Edward!

Would you like to get a custom essay? How about receiving a customized one?

Check it out