![]() |
#31 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
ffttest under emulation
Running the un-modified code under the DIY-KeK posted Aboriginal Linux using the DIY-KeK qemu-1.1.1 build; limited to a single cpucore - - -
Build notes: Code:
(armv6l:1) fft-arm-0.01 # export CFLAGS="-O2 -march=armv6 -mfpu=vfp -mfloat-abi=softfp -fomit-frame-pointer" (armv6l:1) fft-arm-0.01 # gcc $CFLAGS -o ffttest-arm radix4fft.c testfft.c testmain.c -lm (armv6l:1) fft-arm-0.01 # strip --strip-unneeded ffttest-arm (armv6l:1) fft-arm-0.01 # ls -l fft* -rw------- 1 guest guest 1959 Apr 17 2004 fft.h -rwxr-xr-x 1 root root 14068 Jul 17 12:08 ffttest-arm Code:
(armv6l:1) fft-arm-0.01 # ./ffttest-arm Testing a 64-point FFT. 1:1 mapping; no reorder required. SNR: 14167370.493333 (71.512893 dB) Mean energy in: 0.989857 out_test: 0.989554 out_ref: 0.989579 Timing FFT speed... 5.87 us per 64-point FFT, or 21.80 Mbps with QPSK, or 20.29 insns per point on a 220MHz SA-1100. Seems like we could run one of those, with the most recent set of samples taken from the audio stream buffer, just before each e-ink update. Even at a blazing e-ink update rate of 15fps. ![]() I.E: Adding another 6 us in front of each e-ink update to display a rt, graphic, equalizer with a spectral display isn't going to be noticeable - not even to the processor. Let us see what the resource usage is when run as a stand-alone program (rather than hard coded into the equalizer): Code:
(armv6l:1) fft-arm-0.01 # time ./ffttest-arm Testing a 64-point FFT. 1:1 mapping; no reorder required. SNR: 14167370.493333 (71.512893 dB) Mean energy in: 0.989857 out_test: 0.989554 out_ref: 0.989579 Timing FFT speed... 5.90 us per 64-point FFT, or 21.71 Mbps with QPSK, or 20.38 insns per point on a 220MHz SA-1100. real 0m 0.76s user 0m 0.75s sys 0m 0.01s 1 fps isn't going to keep up with a 15 (or 7) fps display. Maybe hard coding the look-up tables ... |
![]() |
![]() |
![]() |
#32 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
A first step in that direction, get a look at what the compiler is seeing:
Code:
(armv6l:1) fft-arm-0.01 # export CFLAGS="-E" (armv6l:1) fft-arm-0.01 # gcc $CFLAGS -o ffttest-arm.i radix4fft.c testfft.c testmain.c Now it can be seen (and read) just how those thousands of trig. calls are used to init the lookup tables. The ffttest-arm.i compiler input file is gzip'd and attached here. Oops The -E option doesn't do multiple *.c files into a single listing. Ah, but the reader should get the idea, even if it has to be called 3 times. Last edited by knc1; 07-17-2012 at 10:00 AM. |
![]() |
![]() |
![]() |
#33 | |
Carpe diem, c'est la vie.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
|
Quote:
The K4/K5 can run up to 45FPS with VERY carefully chosen animation, but for general unfiltered animation, 7.7FPS (average) is about as good as it gets. I can push out a little more speed without significant artifacts by doing spatiotemporal smoothing (averaging across multiple frames) to the video, which I did in a couple of tests, but that does not speed things up on the K3 so I do not make a habit of encoding with that smoothing. It may turn out that how you display the FFT data may be able to do 15FPS on a K5 (and a K4 in diags mode), but the K4 in main mode emulate K3 eink (mostly) so it limits what kindle model you can use. There is not much CPU left over on a K3 either while playing video with sound. |
|
![]() |
![]() |
![]() |
#34 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
The one (working) FFT example from yesterday was written when "ARM doesn't have hardware floating point".
Our Kindles do have hardware floating point, almost, sort-of, ... The K3 (VFP) unit **almost** supports IEEE 754 except for a few of the details that don't matter in the simple math world used here. While re-vamping that decade old code (whose big claim to fame at the moment is that it "works out of the box") a person would want to keep in mind at least the basics: http://infocenter.arm.com/help/index.../Chdidbba.html (and the topic just above on the left) That the VFP does best with short vectors of 8 single or 4 double precision values. Also note, because of the **almost** IEEE 754, you want to include in the gcc optons: -mfpu=vfp -mfloat-abi=softfp -funsafe-math-optimizations -std=c99 Also that libm is a double precision library (the -std=c99 is for some libm functions). Last edited by knc1; 07-17-2012 at 02:28 PM. |
![]() |
![]() |
![]() |
#35 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
This has been fun, but O.T
Now that we have a bit of a feel for how the emulated armv6l environment posted in DIY-KeK works (6us per 64 point FFT vs 9us per 64 point FFT on the K3) . . . .
Time to drop this gpl licensed fun show and go back to the thread topic, the bsd licensed one that needs to port from 16bit to 32 bit. The 16bit example uses a DIY, 8Q7, real number storage format to fit the data into 16bits. With our 32bit integer and floating point processors, we can us "int" and "float" (both 32bit register sizes) in place of the 8Q7. Since libm does double precision (64bit) operations, we need to be a bit careful to keep local "float" versions of the libm results so that we don't need for the C code to "promote / demote" the data types a zillion times per math statement. Although the VFP does have the float <-> double conversions in hardware, it is still better to avoid any work that can be avoided. The VFP has 4 banks of (different views, same registers) either 8 single or 4 double precision vectors. So we want to organize our progress as inner loops of either 4 or 8 data values (depending on precision at that place in the code). And just so everyone does not have to go searching for the O.P. files, I have attached copies here: |
![]() |
![]() |
![]() |
#36 |
( ͡° ͜ʖ ͡°){ʇlnɐɟ ƃǝs}Týr
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,586
Karma: 6299991
Join Date: Jun 2012
Location: uti gratia usura (Yao ying da ying; Mo ying da yieng)
Device: PW-WIFI|K5-3G+WIFI| K4|K3-3G|DXG|K2| Rooted Nook Touch
|
This will clearly be awesome
|
![]() |
![]() |
![]() |
#37 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
A 64 band equalizer might well be considered the "minimum" for a sound studio.
When I had a nice listening room, I had a 16 band equalizer (with spectrum display - all electronic, not a computer anywhere). Maybe setting our sights on an 8 band equalizer for the Kindles would be reasonable (after all, they are a "book" not a "studio"). People might expect to read about FFTs on a Kindle, not run one. ![]() Plus, our hardware does math on 8 data point vectors per instruction. Seems like that would fit with the overall plan. It will probably take some clever coding of the overall algorithm to coax gcc into generating the code we want. That will be a learning experience, I am sure. ![]() Let me see now, we have 3, 8 data point vectors plus 8 float or integer registers for common constants. So the question in my mind arises: "Can we do an 8 point FFT entirely in the VFP?" Another interesting question to learn the answer to. Hmm... I wonder if ARM has already published one as an "application note". That one was too easy to answer: http://infocenter.arm.com/help/index.../Cacjgfad.html Now it just remains to find a copy - outside of the for-purchase development system that we can use. Or at least study for reference. Last edited by knc1; 07-18-2012 at 03:04 AM. |
![]() |
![]() |
![]() |
#38 | |
( ͡° ͜ʖ ͡°){ʇlnɐɟ ƃǝs}Týr
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,586
Karma: 6299991
Join Date: Jun 2012
Location: uti gratia usura (Yao ying da ying; Mo ying da yieng)
Device: PW-WIFI|K5-3G+WIFI| K4|K3-3G|DXG|K2| Rooted Nook Touch
|
I have been digging around on this one.
http://hax4.blogspot.co.uk/2010_02_01_archive.html Advises: Quote:
Thanks |
|
![]() |
![]() |
![]() |
#39 | ||
Carpe diem, c'est la vie.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
|
Quote:
Last edited by geekmaster; 07-18-2012 at 08:34 AM. |
||
![]() |
![]() |
![]() |
#40 | |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
Quote:
applied to it to turn it into readable "C" after all the set-up nonsense is done. |
|
![]() |
![]() |
![]() |
#41 |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
Bit reversed indexing
Here is a simple bit of math, good for someone (like me) who has never written VFP code.
We know that an array can be accessed by its member index. We know that an array in C is just a pointer to the first (0) member. We know that other fft implementations use either a lookup table or hard code something to "bit reverse" the member index numbers. We know that C allows us to do "pointer math". We know the VFP does not do "indexed load/store". But it certainly does scaler, vector math. ![]() We know the address of a member is: Maddr = (base + (index * (size_of(member))) So how do we do "pointer math" for an 8 member array as a vector? For an 8 element array, the bit reverse relationship looks like: Fixed now http://drpbox.knetconnect.com/fft/bits.html Hmm (the intended algorithm, not the actual code) ... * fill multiple with (size_of(member)) - bank 0 * load multiple with "magic index" - bank 1 * mul bank-0 bank-1 bank-0 * fill multiple with (array base) - bank 1 * add bank-0 bank-1 bank-0 and like magic we now have the pointer addresses in S0 .. S7 of the reordered input array. C "pointer math" on steroids Last edited by knc1; 07-18-2012 at 02:54 PM. |
![]() |
![]() |
![]() |
#42 |
Carpe diem, c'est la vie.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
|
That "broken image" in the post above is shown correctly in this embedded quote. Click this image to see the original HTML version.
Last edited by geekmaster; 07-18-2012 at 10:31 AM. |
![]() |
![]() |
![]() |
#43 | |
( ͡° ͜ʖ ͡°){ʇlnɐɟ ƃǝs}Týr
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,586
Karma: 6299991
Join Date: Jun 2012
Location: uti gratia usura (Yao ying da ying; Mo ying da yieng)
Device: PW-WIFI|K5-3G+WIFI| K4|K3-3G|DXG|K2| Rooted Nook Touch
|
Quote:
It would seem my wine installation is Dead beyond recovery after several hours of digging around in registries. I have reported the bug (and found 4 other users who shared my experience, what a joy it is to belong) To that end I shall be now wasting 4 hours doing this all again under Linux ; ) |
|
![]() |
![]() |
![]() |
#44 | |
Going Viral
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
Quote:
The original is now corrected. At least at re-vision #3 it is fixed, and now displays the symmetry I knew was in there. Last edited by knc1; 07-18-2012 at 10:46 AM. |
|
![]() |
![]() |
![]() |
#45 |
Carpe diem, c'est la vie.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
|
I noticed that. The new text does not resize like the previous version that I "screenshotted". An HTML page does not work very well inside the IMG tags you have in your post.
|
![]() |
![]() |
![]() |
Tags |
code, future, kindle |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
QR code for MR | Donnageddon | Lounge | 81 | 10-04-2011 06:28 PM |
Some code help | Gray Eminence | Sigil | 3 | 12-31-2010 09:18 AM |
Let's create a source code repository for DR 800 related code? | jraf | iRex | 3 | 03-11-2010 12:26 PM |
Some help with code | Crusader | ePub | 5 | 01-01-2010 10:23 PM |
Hi all — and see the code | sigizmund | Introduce Yourself | 2 | 12-18-2009 02:53 AM |