blog.humaneguitarist.org

AudioRegent

[Tue, 01 Jun 2010 14:13:04 +0000]
AudioRegent 1.3.1 (Beta) ___________________________________________________________________________ Table of Contents Introduction Installation Running AudioRegent Changing Settings: AudioRegent.xml SimpleADL How It Works Examples FAQ Links ___________________________________________________________________________ Introduction AudioRegent seeks to provide a simple yet effective way to automate the non-destructive creation of derivative audio files from master WAV files by means of an easy-to-use audio decision list. AudioRegent utilizes: 1. SoX [http://sox.sourceforge.net/], i.e. Sound Exchange. SoX is a command line audio editing tool. 2. SimpleADL, an XML [http://www.w3schools.com/xml/xml_whatis.asp]-based audio decision list; developed in conjunction to AudioRegent. AudioRegent and SimpleADL are intended to be used by advanced users conversant in XML and digital audio technologies and terminologies. AudioRegent is licensed under the MIT software license [http://creativecommons.org/licenses/MIT/]. ___________________________________________________________________________ Installation AudioRegent has been tested on 32-bit versions of Windows XP (SP2, SP3), Windows 7, and Xubuntu 9.10. It has been tested using Python versions 2.5 and 2.6. To install the program, download AudioRegent-1.3.1.zip [http://blog.humaneguitarist.org/uploads/AudioRegent/AudioRegent-1.3.1.zip]. Unzip the file and place the root AudioRegent folder wherever you like on your system. Install SoX [http://sourceforge.net/projects/sox/files/] on your system if you don't already have it. * You must make sure that the sox<span style="font-family: Arial,Verdana,sans-serif;"> </span> command is executable from within the AudioRegent folder. Lastly, you need to download and install Python [http://www.python.org/download/] version 2.5 or 2.6 if you don't already have it. * To date, AudioRegent has not been tested with Python versions 2.7 and 3.0. ___________________________________________________________________________ Running AudioRegent To run the default interface for AudioRegent do: $ python AudioRegent.py To see the available command-line options do: $ python AudioRegent.py --help<br/> ___________________________________________________________________________ Changing Settings: AudioRegent.xml Using a simple text editor, you can change some of the things AudioRegent does by changing the element values in the AudioRegent.xml file. The AudioRegent application doesn't care about the attribute values in AudioRegent.xml, but you might want to leave them intact as a record of the default values. Here are the default values: <AudioRegentSetup> <outputType default="ogg">ogg</outputType> <SoxOptions default="gain -n -3">gain -n -3</SoxOptions> <comment default=""/> <delete_outWavs default="true">true</delete_outWavs> <timestampLogFiles default="false">false</timestampLogFiles> </AudioRegentSetup> 1. For choose from the following lowercase values: wav, aif, flac, or ogg. This is the format of the final audio files to be found in outOggs folder. + If you are wondering why mp3 isn't an option, read the SoX format documentation [http://sox.sourceforge.net/soxformat.html] (see "mp3") for information about rendering MP3 files. Basically, you can't use the default version of SoX to make MP3 files due to licensing concerns. If you need mp3 as an output option, you'll have to build SoX from source to have MP3 creation capabilities or you can simply output losless files and later convert them to MP3 files with a third-party application. 2. For just use your preferred and *valid* SoX effects for derivative audio files. These effects will be present in the final audio files in the outOggs folder. AudioRegent will not fail if an invalid string is used, but SoX will and your audio files will not get properly made. + Read the SoX documentation [http://sox.sourceforge.net/sox.html] for more information about effects. 3. For just enter your preferred comment 'tag' or leave this element empty. If you specify a comment string it will likely show up as embedded metadata in OGG, FLAC, and AIF files if one of these formats is chosen as the . 4. For use "true" if you want to empty the *entire* outWavs folder automatically. Use "false" if you want to leave the files in this folder intact. Be warned: using "false" means that *every* WAV file in the outWavs folder will have a derivative placed in the outOggs folder, i.e if there are pre-existing WAV files in the outWavs folder, AudioRegent will be making derivatives of them. 5. For use "false" if you don't want to timestamp the filenames for the log files. Doing so will overwrite the previous log files every time you run AudioRegent. Use "true" if you do want these filenames to be timestamped. ___________________________________________________________________________ SimpleADL SimpleADL stands for Simple Audio Decision List. SimpleADL is a homegrown XML [http://www.w3schools.com/xml/]-based way to: * optionally capture some basic statistics about a master WAV audio file, * define audio regions within the file, * and optionally notate comments and textual components within each region. These components may includes information about the region, interview transcription text, song lyrics, or theatrical dialog, etc. The XML schema for SimpleADL version 1.0 is located here: http://blog.humaneguitarist.org/uploads/AudioRegent/SimpleADL-1.0.xsd [http://blog.humaneguitarist.org/uploads/AudioRegent/SimpleADL-1.0.xsd]. The basic tree structure of an example SimpleADL defining two regions is as follows: <audioDecisionList filename=""> <region id=""> <in unit="seconds"></in> <duration unit="seconds"></duration> </region> <region id=""> <in unit="seconds"></in> <duration unit="seconds"></duration> </region> <outputAsTracks></outputAsTracks> </audioDecisionList> SimpleADL's element provides an easily retained record of desired regions within an audio file. The element specfies where the region starts, while the element specifies the length of the region. The element instructs AudioRegent on how to output derivative files. Specifically, this means that an element value if "true" instructs AudioRegent to output one derivative audio file per region while an element value of "false" would tell AudioRegent to output only one derivative audio file consisting of all regions spliced together. For more, see the section below entitled How It Works. Because SimpleADL has such a basic tree structure, it's easily extensible. For example, here's a SimpleADL file with added technical metadata about the WAV file example2.wav. A block is now also present. <?xml version="1.0" encoding="utf-8"?> <audioDecisionList filename="example2.wav" xmlns="http://blog.humaneguitarist.org/uploads/AudioRegent" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://blog.humaneguitarist.org/uploads/AudioRegent http://blog.humaneguitarist.org/uploads/AudioRegent/SimpleADL-1.0.xsd"> <statistics> <channel position="mono"> <minimumSamplePosition unit="seconds">145.854921</minimumSamplePosition> <minimumSampleValue unit="dbfs">-0.440</minimumSampleValue> <maximumSamplePosition unit="seconds">168.396961</maximumSamplePosition> <maximumSampleValue unit="dbfs">-1.644</maximumSampleValue> <RMS_level unit="dbfs">-26.969</RMS_level> </channel> <length unit="seconds">15</length> </statistics> <region id="_01"> <in unit="seconds">1</in> <duration unit="seconds">9</duration> <text> <xhtml:div> <p>Hello World!</p> </xhtml:div> </text> </region> <outputAsTracks>true</outputAsTracks> </audioDecisionList> Note that the optional statistical information used was based on the available statistics in Sony's Sound Forge [http://www.sonycreativesoftware.com/soundforge] 9.0. AudioRegent doesn't use the element, so you could use the statistical measurements of your choice based on personal preference and what your software is capable of analyzing. You could also disregard this element. Also note that it's safest to use UTF-8 encoding for SimpleADL files. Not doing so could cause AudioRegent to crash if your SimpleADL file contains certain diacritics but is encoded with Windows-1252, etc. ___________________________________________________________________________ How it Works The diagram below shows how AudioRegent would work for bar.wav and its accompanying SimpleADL file bar.adl.xml. The diagram shows what would happen if the SimpleADL element was set to “true” (Left Side of image) or if it was set to “false” (Right Side of image). By default, all WAV files created by AudioRegent in the outWavs and tempWavs folders are deleted automatically. If you want to retain the WAV files in the outWavs folder, see Changing Settings: AudioRegent.xml. IMAGE: [http://blog.humaneguitarist.org/uploads/AudioRegent/howItWorks.png] ___________________________________________________________________________ Examples Here are a few SimpleADL examples that assume we use the default settings in AudioRegent.xml. Hopefully, these will give you an idea of what can be done with AudioRegent when it uses these SimpleADL files. Example 1: Let's assume we have a file called lecture.wav. It's 5 and half minutes (330 seconds) long. There's a nasty, undesirable sound that occurs between the 300 second mark and the 305 second mark. We want to output a file that omits that sound. Using the following SimpleADL file: <audioDecisionList filename="lecture.wav"> <region id="_01"> <in unit="seconds">0</in> <duration unit="seconds">300</duration> </region> <region id="_02"> <in unit="seconds">305</in> <duration unit="seconds">25</duration> </region> <outputAsTracks>false</outputAsTracks> </audioDecisionList> AudioRegent would produce a 5 minute and 25 second file called example.ogg that doesn't contain our unwanted sound. This file will be normalized to -3 dbfs per the value of "gain -n -3". Example 2: Now consider the following SimpleADL file for a 45 second WAV file called AllForAPailOfWater.wav. <?xml version="1.0" encoding="utf-8"?> <audioDecisionList filename="AllForAPailOfWater.wav" xmlns="http://blog.humaneguitarist.org/uploads/AudioRegent" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://blog.humaneguitarist.org/uploads/AudioRegent http://blog.humaneguitarist.org/uploads/AudioRegent/SimpleADL-1.0.xsd"> <region id="_part1"> <in unit="seconds">0</in> <duration unit="seconds">20</duration> <text> <xhtml:div> <p class="soundEffect">Sound of phone ringing.</p> <p class="transcript">Jack: Hello?</p> <p class="transcript">Jill: Hi, Jack. It's me Jill.</p> <p class="comment">Jack pauses for nearly 10 seconds.</p> </xhtml:div> </text> </region> <region id="_part2"> <in unit="seconds">30</in> <duration unit="seconds">15</duration> <text> <xhtml:div> <p class="transcript">Jill: Jack, are you there?</p> <p class="transcript">Jack: What do you want?</p> <p class="transcript">Jill: I just want to know how your crown is? Are you OK?</p> <p class="transcript">Jack: Jill, you can't come tumbling after me anymore. I mean it. Goodbye.</p> <p class="soundEffect">Sound of phone hanging up.</p> </xhtml:div> </text> </region> <outputAsTracks>true</outputAsTracks> </audioDecisionList> AudioRegent would produce two WAV files: AllForAPailOfWater_part1.ogg and AllForAPailOfWater_part2.ogg. Listening to both files back-to-back would let you listen to the conversation while being able to avoid having to hear Jack pause for 10 seconds before he can say anything. I realize that Jack's pause is part of the "story" of this conversation and from a certain perspective it should be left in, but this is just an example. Now you have the sound files, but what else can be done? Well, you also have a transcription of the conversation embedded in the SimpleADL file so by using XSL/XSLT [http://www.w3schools.com/xsl/] (or even copy/paste!) you could extract the text in the

tags where the "class" attribute value equals "transcript", wrap the OGG files inside the HTML5