Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

UFO ET IT

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

ufoet 2021. 1. 8. 20:57

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

약 250,000 개의 항목을 초과하는 매우 큰 목록을 조작하는 기능이 있습니다. 이러한 항목의 대부분은 x 위치에있는 항목을 대체합니다. 그러나 약 5 %의 경우 목록에서 제거해야합니다.

LinkedList를 사용하는 것이 값 비싼 제거를 피하는 가장 확실한 솔루션 인 것 같습니다. 그러나 자연스럽게 인덱스로 LinkedList에 액세스하는 것은 시간이 지남에 따라 점점 느려집니다. 여기의 비용은 몇 분입니다 (그리고 그 중 상당수).

LinkedList를 통해 Iterator를 사용하는 것도 비용이 많이 듭니다. 해당 목록을 편집하는 동안 Iterator 동시성 문제를 방지하기 위해 별도의 복사본이 필요하기 때문입니다. 여기의 비용은 몇 분입니다.

그러나 여기가 내 마음이 조금 날아가는 곳입니다. ArrayList로 변경하면 거의 즉시 실행됩니다.

297515 요소가있는 목록의 경우 11958 요소를 제거하고 다른 모든 요소를 수정하는 데 909ms가 걸립니다. 결과 목록의 크기가 예상대로 실제로 285557이고 필요한 업데이트 된 정보가 포함되어 있는지 확인했습니다.

이게 왜 그렇게 빠른가요? JDK6에서 ArrayList의 소스를 살펴본 결과 예상대로 arraycopy 함수를 사용하는 것으로 보입니다. 상식적으로이 작업에 대한 배열이 수십만 개의 항목을 이동해야하는 끔찍한 아이디어임을 나타내는 것처럼 보일 때 ArrayList가 여기에서 잘 작동하는 이유를 이해하고 싶습니다.

목록 요소를 필터링하기 위해 다음 전략을 각각 시도하면서 벤치 마크를 실행했습니다.

원하는 요소를 새 목록에 복사
~~Iterator.remove()원치 않는 요소를 제거하는 데 사용ArrayList~~
Iterator.remove()원치 않는 요소를 제거하는 데 사용LinkedList
제자리에서 목록 압축 (원하는 요소를 더 낮은 위치로 이동)
의 색인 ( List.remove(int))으로 제거ArrayList
의 색인 ( List.remove(int))으로 제거LinkedList

100000 개의 무작위 인스턴스로 목록을 채우고 Point95 %의 요소를 허용하고 나머지 5 %를 거부하는 필터 조건 (해시 코드 기반)을 사용할 때마다 (질문에 명시된 것과 동일한 비율이지만 더 작은 목록 250000 개의 요소에 대한 테스트를 실행할 시간이 없었기 때문입니다.)

평균 시간 (이전 MacBook Pro : Core 2 Duo, 2.2GHz, 3Gb RAM)은 다음과 같습니다.

CopyIntoNewListWithIterator   :      4.24ms
CopyIntoNewListWithoutIterator:      3.57ms
FilterLinkedListInPlace       :      4.21ms
RandomRemoveByIndex           :    312.50ms
SequentialRemoveByIndex       :  33632.28ms
ShiftDown                     :      3.75ms

따라서 a LinkedList에서 인덱스로 요소를 제거하는 것은에서 제거하는 것보다 300 배 이상 비싸고 ArrayList다른 방법 (선형 검색 및 arraycopy) 보다 6000-10000 배 더 비쌉니다.

여기서는 네 가지 더 빠른 방법 사이에 큰 차이가없는 것 같지만 다음 결과와 함께 500000 요소 목록으로이 네 가지를 다시 실행했습니다.

CopyIntoNewListWithIterator   :     92.49ms
CopyIntoNewListWithoutIterator:     71.77ms
FilterLinkedListInPlace       :     15.73ms
ShiftDown                     :     11.86ms

더 큰 크기의 캐시 메모리가 제한 요소가되므로 목록의 두 번째 복사본을 만드는 데 드는 비용이 중요하다고 생각합니다.

코드는 다음과 같습니다.

import java.awt.Point;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.TreeMap;

public class ListBenchmark {

    public static void main(String[] args) {
        Random rnd = new SecureRandom();
        Map<String, Long> timings = new TreeMap<String, Long>();
        for (int outerPass = 0; outerPass < 10; ++ outerPass) {
            List<FilterStrategy> strategies =
                Arrays.asList(new CopyIntoNewListWithIterator(),
                              new CopyIntoNewListWithoutIterator(),
                              new FilterLinkedListInPlace(),
                              new RandomRemoveByIndex(),
                              new SequentialRemoveByIndex(),
                              new ShiftDown());
            for (FilterStrategy strategy: strategies) {
                String strategyName = strategy.getClass().getSimpleName();
                for (int innerPass = 0; innerPass < 10; ++ innerPass) {
                    strategy.populate(rnd);
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        if (totalTime == null) totalTime = 0L;
                        timings.put(strategyName, totalTime - System.currentTimeMillis());
                    }
                    Collection<Point> filtered = strategy.filter();
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        timings.put(strategy.getClass().getSimpleName(), totalTime + System.currentTimeMillis());
                    }
                    CHECKSUM += filtered.hashCode();
                    System.err.printf("%-30s %d %d %d%n", strategy.getClass().getSimpleName(), outerPass, innerPass, filtered.size());
                    strategy.clear();
                }
            }
        }
        for (Map.Entry<String, Long> e: timings.entrySet()) {
            System.err.printf("%-30s: %9.2fms%n", e.getKey(), e.getValue() * (1.0/25.0));
        }
    }

    public static volatile int CHECKSUM = 0;

    static void populate(Collection<Point> dst, Random rnd) {
        for (int i = 0; i < INITIAL_SIZE; ++ i) {
            dst.add(new Point(rnd.nextInt(), rnd.nextInt()));
        }
    }

    static boolean wanted(Point p) {
        return p.hashCode() % 20 != 0;
    }

    static abstract class FilterStrategy {
        abstract void clear();
        abstract Collection<Point> filter();
        abstract void populate(Random rnd);
    }

    static final int INITIAL_SIZE = 100000;

    private static class CopyIntoNewListWithIterator extends FilterStrategy {
        public CopyIntoNewListWithIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            ArrayList<Point> dst = new ArrayList<Point>(list.size());
            for (Point p: list) {
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class CopyIntoNewListWithoutIterator extends FilterStrategy {
        public CopyIntoNewListWithoutIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            ArrayList<Point> dst = new ArrayList<Point>(inputSize);
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class FilterLinkedListInPlace extends FilterStrategy {
        public String toString() {
            return getClass().getSimpleName();
        }
        FilterLinkedListInPlace() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (Iterator<Point> it = list.iterator();
                 it.hasNext();
                 ) {
                Point p = it.next();
                if (! wanted(p)) it.remove();
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class RandomRemoveByIndex extends FilterStrategy {
        public RandomRemoveByIndex() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class SequentialRemoveByIndex extends FilterStrategy {
        public SequentialRemoveByIndex() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class ShiftDown extends FilterStrategy {
        public ShiftDown() {
            list = new ArrayList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            int outputSize = 0;
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) {
                    list.set(outputSize++, p);
                }
            }
            list.subList(outputSize, inputSize).clear();
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

}

어레이 복사는 다소 비싸지 않은 작업입니다. 이것은 매우 기본적인 수준 (Java 네이티브 정적 메서드)에서 수행되며 아직 성능이 정말 중요 해지는 범위에 있지 않습니다.

귀하의 예에서 크기 150000 (평균)의 배열을 약 12000 번 복사합니다. 이것은 많은 시간이 걸리지 않습니다. 여기 내 노트북에서 테스트했는데 500ms도 채 걸리지 않았습니다.

업데이트 다음 코드를 사용하여 랩톱 (Intel P8400)에서 측정했습니다.

import java.util.Random;

public class PerformanceArrayCopy {

    public static void main(String[] args) {

        int[] lengths = new int[] { 10000, 50000, 125000, 250000 };
        int[] loops = new int[] { 1000, 5000, 10000, 20000 };

        for (int length : lengths) {
            for (int loop : loops) {

                Object[] list1 = new Object[length];
                Object[] list2 = new Object[length];

                for (int k = 0; k < 100; k++) {
                    System.arraycopy(list1, 0, list2, 0, list1.length);
                }

                int[] len = new int[loop];
                int[] ofs = new int[loop];

                Random rnd = new Random();
                for (int k = 0; k < loop; k++) {
                    len[k] = rnd.nextInt(length);
                    ofs[k] = rnd.nextInt(length - len[k]);
                }

                long n = System.nanoTime();
                for (int k = 0; k < loop; k++) {
                    System.arraycopy(list1, ofs[k], list2, ofs[k], len[k]);
                }
                n = System.nanoTime() - n;
                System.out.print("length: " + length);
                System.out.print("\tloop: " + loop);
                System.out.print("\truntime [ms]: " + n / 1000000);
                System.out.println();
            }
        }
    }
}

몇 가지 결과 :

length: 10000   loop: 10000 runtime [ms]: 47
length: 50000   loop: 10000 runtime [ms]: 228
length: 125000  loop: 10000 runtime [ms]: 575
length: 250000  loop: 10000 runtime [ms]: 1198

I think the difference in performance is likely coming down to the difference that ArrayList supports random access where LinkedList does not.

If I want to get(1000) of an ArrayList I am specifying a specific index to access this, however LinkedList doesn't support this as it is organized through Node references.

If I call get(1000) of LinkedList, it will iterate the entire list until if finds index 1000 and this can be exorbitantly expensive if you have a large number of items in the LinkedList.

Interesting and unexpected results. This is just a hypothesis, but...

On average one of your array element removals will require moving half of your list (everything after it) back one element. If each item is a 64-bit pointer to an object (8 bytes), then this means copying 125000 items x 8 Bytes per pointer = 1 MB.

A modern CPU can copy a contiguous block of 1 MB of RAM to RAM pretty quickly.

Compared to looping over a linked list for every access, which requires comparisons and branching and other CPU unfriendly activities, the RAM copy is fast.

You should really try benchmarking the various operations independently and see how efficient they are with various list implementations. Share your results here if you do!

I'm skipping over some implementation details on purpose here, just to explain the fundamental difference.

To remove the N-th element of a list of M elements, the LinkedList implementation will navigate up to this element, then simply remove it and update the pointers of the N-1 and N+1 elements accordingly. This second operation is very simple, but it's getting up to this element that costs you time.

For an ArrayList however, the access time is instantaneous as it is backed by an array, meaning contiguous memory spaces. You can jump directly to the right memory address to perform, broadly speaking, the following:

reallocate a new array of M - 1 elements
put everything from 0 to N - 1 at index 0 in the new arraylist's array
put everything N + 1 to M at index N in the arraylist's array.

Thinking of it, you'll notice you can even reuse the same array as Java can use ArrayList with pre-allocated sizes, so if you remove elements you might as well skip steps 1 and 2 and directly do step 3 and update your size.

Memory accesses are fast, and copying a chunk of memory is probably sufficiently fast on modern hardware that moving to the N-position is too time consuming.

However, should you use your LinkedList in such a way that it allows you to remove multiple elements that follow each other and keep track of your position, you would see a gain.

But clearly, on a long list, doing a simple remove(i) will be costly.

To add a bilt of salt and spice to this:

See the note on Efficiency on the Array Data Structure and the note on Performance on the Dynamic Array Wikipedia entries, which describe your concern.
Keep in mind that using a memory structure that requires contiguous memory requires, well, contiguous memory. Which means your virtual memory will need to be able to allocate contiguous chunks. Or even with Java, you'll see your JVM happily going down with an obscure OutOfMemoryException taking its cause in a low-level crash.

ReferenceURL : https://stackoverflow.com/questions/6101789/why-does-javas-arraylists-remove-function-seem-to-cost-so-little

'UFO ET IT' 카테고리의 다른 글

std :: vector 대신 QVector (Qt)를 사용하는 이유 (0)	2021.01.08
SOAP 웹 서비스에 대한 WSSE Usernametoken 통신 올바른 방법 (0)	2021.01.08
manpath가 아닌 .man 파일을 읽는 방법은 무엇입니까? (0)	2021.01.08
파이썬 스레딩. (0)	2021.01.08
NaN의 비트 패턴이 실제로 하드웨어에 의존합니까? (0)	2021.01.08

현재글Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

ufoet

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

'UFO ET IT' 카테고리의 다른 글

'UFO ET IT'의 다른글

티스토리툴바

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

Java의 ArrayList의 제거 기능이 비용이 적게 드는 이유는 무엇입니까?

'UFO ET IT' 카테고리의 다른 글

'UFO ET IT'의 다른글

관련글

티스토리툴바