How do you create and organize music data structures for a music software?
Music can be represented in several ways, according to what your music software must accomplish.
This article presents various possible approaches that can orient the way music data structures are organized efficiently in a music software.
There are mainly four levels at which you can describe the content of a piece of music. They are:
Each level can be considered to be a layer that can contain information about the piece of music. Each layer has its own information. A part of the information found in each layer is common, but some information is specific to a layer, so they are not merely duplicates of each others.
It is very important to determine precisely what your software must be able to do with the music, as it will help you to design the best suitable music data structures that will fit your need.
Using the wrong type of data will introduce lots of troubles when you program the processing of the music, so you should take some time to think about it right at the beginning of your design cycle. Changing the data structures when your program is already half finished is a lot of work, so do not omit this step !
Let us examine each layer and see how they are related to the other layers and what you can expect your program to process with them.
Music being an sound phenomena, the most obvious way to represent it is by sampling the sound pressure variations and creating a data structure that simply list all the samples.
This is the format found on a CD (44100 samples per second, 16 bits definition, stereo) as well as in various audio files like WAVE, mp3, aiff,...
Each file format has some specific features and some compress the data to be more efficient for storing and/or transmitting the music, but basically they all contain a list of audio samples.
What is needed to play the music?
To reproduce the music, the software must simply take the samples and feed them in due time to the sound card driver (through the audio API of Windows and Mac for instance) and that's it.
If you do not need any other type of processing, the APIs often give you a basic player tool to which you must simply supply the file name of the music you want to play.
What kind of processing can you do with it?
The audio format of the music is quite limited for processing, as it does not include any information about the notes played, the tempo, which instruments are playing, even less information about chords or melodic lines and no information related to the structure of the composition.
Whether the audio file contains a Beethoven's Symphony or a herd of dogs barking is not easily discernable by software processing.
Generally, if your software does not care about the real content of the audio data, you can use this format. For instance, the audio format is appropriate if your software must accomplish the following tasks:
How is this format generated?
The most obvious way to fill in an audio data structure is simply to record the sound data directly with a microphone through a simple audio recording software. The Windows and Mac API let you also do this directly from your program, using the audio interface of the sound card.
You can also generate an audio file by processing one of the other layers of music data representations. The processing is then called synthesis, as the program generates each sample according to the structure of another layer, like for instance the MIDI representation, which tells what notes to play and when to play them.
In fact, all three other layers (as further explained) need some type of synthesis so that you can hear the sounds resulting from performing the content of the music data structures.
How can you store this data in memory and on files?
Basically all you need for the audio music data structure is:
If your program handles several audio formats, you may need to add variables which will specify the format used:
In the case of a 16 bit signed integer, the value of zero represents the sound pressure at rest, when no sound is present. So the full buffer is filled with zero for a silence sound data structure.
The data of a stereo sound (2 channels) is often coded as a unique buffer. The first left sample is followed by the first right sample, then the second left sample and so on. This is because while handling a stereo sound, one often needs to handle both channels at once. This is the case for the WAVE file format.
A sine wave will be represented by a series of numbers that build the shape of a sine wave in the audio buffer:
You can find the specification of most audio file formats on the internet by looking for instance in Google with the expression "Audio WAVE file format specification".
More about processing audio music data structures
One important point is when you combine two or more signals into one, for instance for mixing two waves together. As the signed integers are limited in scope [-32768,+32767], you must take care to avoid any overflow in calculations, as it will result in a loud and disturbing "Clank" or horrible distortion effects when the data will be played back.
You can simply mix several files by adding the signals sample values together and setting the results in a new sound data structure.
You can copy/paste/move blocks of samples and assemble them in various orders. However, be sure that the transitions of the signals between blocks is always smooth as any sudden transition will cause a "click" that will be heard in the sound playback, like for instance the value of two consecutive samples being too much different.
You can invert the time for special effects by reading the samples backwards.
There are thousands of possible sound processing effects, which are globally called DSP (Digital Signal Processing). Some are simple (fade in and fade out is simply the progressive scaling of the samples by multiplication) and some are very complex (creation of a reverberation effect, that simulates a specific room acoustical behaviour for instance).
You can slow down or accelerate the sound, but it requires to use the technique of resampling the sound samples and using enough interpolation of the samples that are in the buffer, otherwise interference will occur and spoil the quality of the results. This is a more advanced technique that needs some mathematics to implement.
If you need to make lots of calculations with the sound samples, if is often better to use floating point buffers for the data, as this will avoid rounding errors that may deteriorate the sound quality.
You can do an awful lot of interesting processing with the audio music data structures, as far as you do not need your program to display and/or edit the music in a more readable human format like a score or a sequence of notes to play on a piano keyboard or a guitar fretboard.
If your program needs more advanced access to the content of the music, you will need to use one or more of the three other layers of music data structures.
This layer of music representation includes in fact any music data structures that focuses on the playback and editing of music as represented by one or more sequences of notes.
The most common form of this is the MIDI file or any structure that contains such information in memory.
Compared to the audio format, we have here a more subtle understanding of the piece of music, as we have all information about the notes, their starting and ending time, the dynamics, the tempo, volume variations, pitch effects and several other performance information like for instance the type of instruments that must be used to play the notes.
However, we loose the control of the final quality of the sound, as this level of information does not include samples of real instruments or any information that could be used to build the final samples that the sound card will play.
The MIDI language was designed as a universal language to transmit the real time information needed to control an existing synthesizer or musical instrument. It is mostly used so that a computer can control the hardware of an external synthesizer and make it perform a piece of music. You can learn more about the MIDI language here :
At this level, the pitch of a note is represented by an integer number that refers to the music scale in semitones. The most common convention is the one used in the MIDI standard. The value of "0" corresponds to C. The value of "1" corresponds to C# or Db,... and the value of "11" corresponds to B. We continue then with "12" which is again C, and so on for the next octaves. By convention, the central C has the value "60" (a multiple of 12) and corresponds to the lower C in the G clef and the higher C in the F clef:
What is needed to play the music?
This music data structure needs a synthesizer to be performed, as it does not include any sample of audio data that the hardware of the sound card could recognize.
The simplest way to listen to the music is to send through the MIDI API (Application Programming Interface) of Windows/Mac the correct sequence of MIDI events as represented in the music data structure.
Both Windows and Mac have a built-in synthesizer that can perform the translation of these MIDI commands into an audio stream that is sent to the audio card.
In this case of course, you do not have the control over the final quality of the music, as these basic synthesizers are far from being fully realistic for all instruments.
The next step to increase the sound quality is to send the same MIDI commands to an external synthesizer of good quality, through a MIDI interface (a cheap device that converts USB to MIDI and vice versa).
You can also use a virtual synthesizer, connected by using an internal virtual MIDI connexion between programs. There are many very good sound libraries that include realistic samples and that can render a piece of music with a very high quality.
The last possible step, if you want full control on the audio output, is to build yourself a synthesizer and create the audio stream directly from the notes sequences contained in the music data structures.
One method to do that is to use a library of samples of real instruments. The beginning of a note will start reading an audio buffer that represent the correct note. However, there are several things to manage if you want a good, realistic rendering:
You can also experimentally generate simple wave forms and/or add synthesis algorithms that will provide original sounds. This kind of application can be very time consuming and become quite complex, but of course can also be quite exciting to program.
What kind of processing can you do with this music data structure?
Your program has access to each note of the music, so you can easily analyse the harmony of the music, you can edit each note individually and in general you have a full editing capability for all notes played by all instruments. Possible applications are:
As far as you do not need precise music notation handling, you can use the MIDI music data structures explained in this section. You can extract and display raw music notation from the MIDI data structure, but as MIDI was not developed for music notation, there are several drawbacks that will prevent your program from handling music notation correctly. The most obvious one is that C# and Db are represented in MIDI as being the same note, which is not the case on a sheet music.
How is this music data structure generated?
You can record directly in MIDI the performance of a musician on a MIDI keyboard, using the MIDI programming interface of Windows and Mac and store it into your data structures.
You can also create a graphic interface like the standard "piano roll" editor, that represents the notes as horizontal bars that the user can add, resize, move, delete, copy/paste, transpose,...
Combined with notation, it looks like this:
Some software can also analyse the notes that are present in an audio file, so that you can go from the audio layer to the MIDI layer. But this processing is quite difficult and complex to handle. For a single melody, it can work pretty well, but as the polyphony increases, results get less and less precise. And when you reach the level of a full orchestra playing, no program at this time can extract the full detailed MIDI information from it.
How can you store these music data structures in memory and on files?
The smallest unit of information is a note, which must at least have the following information:
The notes are then organized in time sequences. A sequence should at least include the following information:
A full piece of music could then consist of one or more such sequences played together.
To store the music data structures on files, you can use the MIDI file format, with the advantage that it can be exchanged between most existing musical applications.
However, if you need to store more information related to the type of processing your application is doing, you will need to develop your own custom file format, like many music software do. But do not forget to implement an import/export function from/to the standard MIDI file format, as it opens the door for file exchange between all musical programs.
An important point of warning is the way to structure the MIDI data inside your program. Try to make one note as one object that contains its duration and that can be placed within a time sequence.
Remember that to play a note, you must send two MIDI messages: a NOTE ON message at the starting point and a NOTE OFF message when the note terminates. A note can be quite long and start in one measure and stop 3 measures later for instance.
In a purely MIDI sequence, such a note would be split in two messages. If each measure is a list of events, then the notes information may be scattered through several measures. This works nicely for playing back a sequence. But if you want to make processing of notes (delete, move, edit, transpose, copy/paste,...) each time you find a starting note, you must run a loop to find its corresponding stop, which makes processing much more complex and susceptible to bugs.
The best way is to have a separate object for each note and before starting to play, prepare a list of MIDI messages to be sent to the MIDI interface.
More about processing MIDI music data structures
One important point to keep in mind is the real time aspect of MIDI playback and recording. When your program starts playing or recording a MIDI sequence, it must have a timing architecture that will send the individual MIDI messages exactly on time.
Using a loop and waiting until the next event must be sent is of course a very limited method and is only valid for small basic applications for playing a simple example when the users do not need to interact with the program during the playback.
But for anything more than that, you will need to set up a timer routine that will be called regularly and check what event must be sent to the MIDI driver. Both Windows and Mac OS X have such a mechanism, but be sure to use the timer that uses the interruption level of the processor, which will be called with higher priority than the normal operation of the user interface (do not use the timer event that is simply dropped in the main event loop of the program, as this has a lower priority).
Using an interruption timer that is called every millisecond seems to be enough to reach a good timing resolution. Here is an example on how you can implement the playing of a MIDI sequence:
In this way, the user can continue to use the software, while the task of playing a sequence is running. Several major music notation software have neglected this point and when the users starts playing the score, all editing features are disabled. It can be very useful to play a few measures in a continuous loop and be able to edit the music and directly hear the change (for instance to create a drum pattern intuitively). Of course, it requires to correctly handle the possible interferences between the processing that prepares or updates the buffer and the routine that is reading it, so as to avoid any crash or dead lock.
If you need to process the music notes themselves (but not based on a precise music notation handling) and if the user must be able to edit the notes, change the instruments, tempo, volumes,... you need at least to have music data structures to represent notes and sequences of notes as described above.
At this level of information, we add the graphic aspects of music notation to the music data structures. The traditional music notation standard has evolved through several centuries to arrive at its present state of development. It is still evolving to adapt itself to the needs of graphic representation of new compositional practices. But the most important part of it is quite conservative and has very precise rules.
The purpose of music notation is to make a written representation of the content of the music, so that musicians can read it and play it back. It is mainly a communication tool so that others can play the music you write and vice versa.
Clarity of the content of the score is of course priority number one so that the musician will be able to read it as easier as possible. However, as it is a graphic representation, we come closer to the graphic arts and there is some aesthetic value to a beautiful, well designed page and content layout of a score.
Before the computer era, music was engraved by professionals working with metal shapes to place every note, dynamic, staff line,... into a metal frame that was then used to print the final score. Correction was practically impossible and before being a professional, you had to spend 10 years being trained on this art.
The MIDI language was invented to communicate from a computer to a synthesizer or between synthesizers. Music notation was not at all the purpose of MIDI, so MIDI is completely inadequate to represent music notation to its fullest extent.
Of course, you can create a sheet music from a MIDI file, but the algorithm to do this is not straightforward and several decisions need to be taken that are not always easy for a computer to take and may be in contradiction with the standard rules of music engraving.
So the main point is that to represent the music as music notation, you need a series of music data structures that modelize the main graphic objects found in a score. You can divide these objects into three main classes:
As we cannot write here a full tutorial about music notation, if you have no knowledge about it, here is a tutorial that explains most of the principles of the music notation language, with examples:
Let us take an example of the above three classes of objects. Here is a system that contains 2 staves of 3 measures:
As it is, there is no reference to time, except that this container is organized as a sequence of three measures. There is also no reference to the pitch. We need to place a time signature and a clef with an optional key signature, so that the pitch and time references are defined:
We can now add a series of notes that form a melody:
By placing the notes on the staff, each pitch is determined based on the clef that is before it (the upper clef is the G clef, which puts a G note on the second line of the staff, counting from the lower line; the lower clef is the F clef, defining the F note on the fourth line of the staff, counting from the lower line).
The 3 sharps present at the beginning of the staff indicates that the F, C and G notes have a sharp by default (one semitone higher than the natural note). The "3/4" indication tells you that there are 3 quarter notes in a measure.
What is needed to play the music?
As music notation is designed to communicate what to play to the musician, its music data structure necessarily contains all the information needed to play the music. It requires some basic processing of the music data structures to extract the exact MIDI sequence of messages that need to be sent to a synthesizer to play back the music.
The main steps are:
During playback, the application may also color the notes to show which one is played and draw a moving cursor indicating the current playback position.
What kind of processing can you do with this music data structure?
You can do most of the processing done with the layer 2 (MIDI) music data structures. However, you must see that the layer 2 and layer 3 information stay synchronized.
For instance, if you need to transpose a note, you can still add a number to the original pitch, but you also need to adapt the note position on the staff, according to the clef. You must check the accidentals of the notes before it as well as the key signature. If a slur or articulation symbol is attached to the note, its position must also be adjusted graphically, so the processing is much more complex than handling MIDI data.
When the user adds new notes to the score, an important part is the correct spacing and alignment of the notes inside the measure and between instruments playing together.
Of course, the alignment of 4 quarter notes is quite easy for a solo instrument. But when you have an orchestral score with 30 instruments playing different rhythms, it can become quite complex. Spacing is mainly influenced by the following elements and their combinations:
The algorithm can become quite complex to finally reach a good compromise for readability and the general aesthetics of the score.
How is this music data structure generated?
The program can generate most of this automatically from a MIDI file, but some corrections may be needed to reach a nice, natural and readable score.
The algorithm that transforms the raw MIDI data to a natural and smooth music notation is not as simple as it may seem. The main difficulties are related to the following points:
If you write a music notation software, you need to provide graphic tools to add, remove, move, copy, paste, transpose notes and selections of notes. This kind of interface is mainly a graphic processing algorithm and may take in itself quite some time to develop.
You can also import a MusicXML file, which has become the standard file format for music notation file interchange. However, reading a MusicXML file, transforming it into your own music data structures and displaying it on screen or to the printer, is not an easy straightforward task to accomplish and may take considerable time to develop. It all depends on what your software must be able to do.
How can you store these music data structures in memory and on files?
One way of organizing the music data structures is to consider the score as a hierarchical object, as follows:
Many processing tasks will nicely fit this hierarchical music data structure. However, there are also several processing tasks that will be rendered more complex because according to the type of music, there can be in fact many links and relations between two or more successive measures of the same instrument, and also some interactions between two consecutive staves.
The most common measure interactions are:
These interactions destroy the nicely organized vertical music data structure explained above and create dependencies between two or more objects that are parts of several different music data structures.
When this happens in a data structure model, it often means that we did not catch the most natural way of handling the data entities involved. If we come back to the above example:
we can see that the melody is in itself an entity. But it starts in the middle of the first measure and ends in the middle of the third measure. There are cases where a melodic phrase stops in a measure and a new melodic phrase starts right after it, in the same measure.
So the question is: what is the most natural entity, the measure or the melodic phrase?
As we can easily observe in most cases, the dependencies of two or more measures are used in the context of one melodic phrase (slurs, tied notes, crescendo, cross staff beaming).
So we could slightly modify our hierarchical model as follows:
Of course we have moved the dependency from the level of the container to the level of the music data flows, but they are less arbitrary as they are part of the concept of a natural music entity (the melody for instance). So we may expect to smooth out a part of the complexities that occur while processing the music data structures.
Using this model, we could have a temporary data structure that helps to display the music, by placing each note, rest and symbols in its destination measure with the correct spacing, according to the clef, time and key signature present.
For the playback aspect, a single music data flow can easily be converted into a MIDI sequence, as each note has its absolute pitch value available. We would probably need to add a header to the music data flow so as to include the type of instrument playing it as well as controller values (volume, pan, reverb,...).
Once this main algorithm has been developed properly, the following operations would then become very simple and straightforward, probably only using a few lines of code:
as well as many others, as the music data structures used is more close to a natural concept of the organization of a piece of music.
How you can add music notation features in your applications
As we can see, the addition of the graphic aspects of music notation adds several complexities to the development efforts involved, compared to the MIDI data level and the music data structures are more sophisticated.
But the advantage is that we can display and edit the music in the most common and universal language developed by man: music notation. Moreover, the user can manipulate the music quite naturally, like writing the notes on a sheet of paper.
Since 1992, I develop the interface and functions of the Pizzicato music software (which is a general purpose music notation editor) and so I know how much time can be spent in writing music notation algorithms. It is of course a great and exciting adventure.
I sure encourage you to develop your own music software, as it is a great joy to create and develop one's own ideas, see how they become a reality in front of the screen and share it with users.
According to what you want to achieve, layer 1 (audio) and layer 2 (MIDI) are quite affordable in development efforts.
If your program needs full featured music notation display and editing capabilities, it may take you months or years of efforts according to the functions you need to implement.
I have spent myself about 20 years on these development tasks (in addition to other tasks like intuitive composition tools and some other projects).
Originally I started the Pizzicato project mainly because I wanted to develop more advanced intuitive composition tools. It progressively became a full featured score editor and arranging program and today has more than 12,000 users worldwide on Windows and Mac OS X.
I recently decided to build a software development kit (SDK) that would help any programmer to add music notation in their applications, so that they could directly focus their development efforts to the specificities of their software and avoid spending months or years rewriting music notation algorithms.
The music notation SDK contains an API that helps you to read and write standard music files (reads MIDI files, musicXML files and Pizzicato files and writes MIDI files, musicXML files, Pizzicato files, audio files and PDF files).
The content of the music documents can be accessed and modified with a set of API (Application Programming Interface) giving editing capabilities of practically every aspect of the score. You can create a new document and build its content directly by programming.
You can find more information about this music notation SDK and its music data structures here:
In any case, feel free to contact me, I would be happy to discuss these music development subjects with you. You can write to me at:
The last layer of music data structures is more abstract and is above the level of single notes and MIDI messages.
The purpose here is to develop algorithms that can help the user to compose his own music more intuitively.
I do NOT mean that the computer will compose for the user, which would be the wrong way to look at it, as a computer can not "compose" music.
Sure the computer can generate random patterns that may look nice, but composing is much more than that. It is the process by which the composer will express his own feelings, emotions and ideas into a series of notes and sounds that will be able to communicate and be shared by the people who listen to his music.
This music data structures may be associated to methods that generate music that will sound nice and that will still be the creation of the user.
It is like a piano player who plays on the piano and lets his imagination run freely with his hands. In this way, the piano helps him to compose, by reacting to his handling of the piano.
But nobody would accuse the piano of having made the composition, right? So it is the same while composing with a computer: the computer can stimulate the imagination of the user and let him become more creative and compose his own music.
What is needed to play the music?
It all depends on the algorithm and methods used. Most of the time, you can directly and easily generate a MIDI sequence that will be the expression intended in the music data structure of your algorithm.
From the MIDI sequence, you can also display the corresponding music notation in a more human and readable form than a list of MIDI messages with numbers.
Let us take a very simple example, an arpeggiator.
The user can define several types of arpeggios, including some random factors, specify the instrument that must play them, the speed of the arpeggios (quarter notes, eighth notes, triplets, ...) and the range of the notes to be used.
You can define a music data structure that will contain all these parameters and store them in memory or in a file as one object called anarpeggiator. You can let the user assemble many different arpeggiator in sequence and/or in parallel.
You could also define a music data structure called a chord progressionand let the user associate it with a sequence of arpeggiators.
Your program must then be able, starting from this array of music data structures called Arpeggiators, to generate the sequence of notes according to the chord progression so as to send the correct messages to the MIDI interface and hear the resulting music.
The Pizzicato music software offers several examples of this kind of tools, each time around a specific customized music data structure, like for instance:
The four layers of music data structures explained here offer a wide range of possible applications.
According to your software specification, you may need to use only one of them or use several or all of them.
I wish you a nice time developing music software.
If you have any question about this article, feel free to contact me at :
Music Software Designer
since 1992 at Arpege Music