2016-08-17 31 views
5

Mam klaster z 1 master i 6 slaves, który używa gotowej wersji hadoop 2.6.0 i iskry 1.6.2. Byłem uruchomiony MR MR i iskier pracy bez problemu z openjdk 7 zainstalowany na wszystkich węzłach. Jednak, gdy uaktualniłem openjdk 7 do openjdk 8 na wszystkich węzłach, iskrować i iskrzące powłoki z przędzą spowodowały błąd.Przędza robocza z iskrą nie działa z Javą 8

16/08/17 14:06:22 ERROR client.TransportClient: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelExce  ption 
java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$m  cV$sp(YarnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelException 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) 
     at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
     at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
     at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
     ... 1 more 
Caused by: java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 ERROR spark.SparkContext: Error initializing SparkContext. 
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "/home/hd_spark/spark2/python/pyspark/shell.py", line 49, in <module> 
    spark = SparkSession.builder.getOrCreate() 
    File "/home/hd_spark/spark2/python/pyspark/sql/session.py", line 169, in getOrCreate 
    sc = SparkContext.getOrCreate(sparkConf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 294, in getOrCreate 
    SparkContext(conf=conf or SparkConf()) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 115, in __init__ 
    conf, jsc, profiler_cls) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 168, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 233, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__ 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 

Mam eksportowane JAVA_HOME na .bashrc i ustawiono OpenJDK 8 jako domyślnego Java z wykorzystaniem

sudo update-alternatives --config java 
sudo update-alternatives --config javac 

tych poleceń. Próbowałem również z oracle java 8 i pojawia się ten sam błąd. Logi kontenerów na węzłach podrzędnych mają taki sam błąd jak poniżej.

SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/filecache/17/__spark_libs__8247267244939901627.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
16/08/17 14:05:11 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: [email protected] 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for TERM 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for HUP 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for INT 
16/08/17 14:05:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 78 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 1 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/appcache/application_1471352972661_0005/blockmgr-d9f23a56-1420-4cd4-abfd-ae9e128c688c 
16/08/17 14:05:12 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 
16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://[email protected]:37417 
16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 
16/08/17 14:05:13 INFO storage.DiskBlockManager: Shutdown hook called 
16/08/17 14:05:13 INFO util.ShutdownHookManager: Shutdown hook called 

Próbowałem z iskrą 1.6.2 gotowych wersji, iskra 2.0 pre-built wersji, a także starał się iskry 2,0 budując go samodzielnie.

Praca Hadoop działa idealnie nawet po uaktualnieniu do wersji Java 8. Kiedy wracam do java 7, iskra działa dobrze.

Moja wersja scala to 2.11, a system operacyjny to Ubuntu 14.04.4 LTS.

Będzie świetnie, jeśli ktoś może dać mi pomysł rozwiązania tego problemu.

Dzięki!

ps Zmieniłem mój adres IP jako xxx.xxx.xxx.xx w dziennikach.

+0

wygląda robotników próbują połączyć się z kierowcą, ale zawiedzie: '16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Podłączenie do sterownika: iskra: //[email protected]: 37417 16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM'. Co mówi dziennik sterownika? –

+0

Gdzie mogę znaleźć rejestr sterownika? Znalazłem dzienniki węzła roboczego w katalogu hadoop/logs/loglog, ale nie mogę znaleźć żadnych dzienników związanych ze sterownikiem w węźle głównym. W katalogu spark/logs są puste tylko logi serwera historii i hadoop/logs/userlog w węźle głównym. Dzięki! – jmoa

+0

http://spark.apache.org/docs/latest/running-on-yarn.html –

Odpowiedz

8

Na dzień 12 września 2016 roku, jest to kwestia bloker: https://issues.apache.org/jira/browse/YARN-4714

można przezwyciężyć poprzez utworzenie następujące właściwości w przędzy site.xml

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name> 
    <value>false</value> 
</property> 

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property> 
+0

Dzięki za odpowiedź! Niedawno wróciłem do Java 7, ale spróbuję i skomentuję, czy to działa. – jmoa

+0

@jmoa Jakieś szczęście? – simpleJack

+0

To działa idealnie dla mnie. –